Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually

2016-04-18 Thread Pierre Moreau
On 11:21 AM - Apr 18 2016, Hans de Goede wrote:
> Hi,
> 
> On 17-04-16 22:27, Pierre Moreau wrote:
> >On 04:17 PM - Apr 17 2016, Ilia Mirkin wrote:
> >>On Sun, Apr 17, 2016 at 4:07 PM, Pierre Moreau  
> >>wrote:
> >>>Ping :-)
> >>>
> >>>On 10:56 PM - Mar 19 2016, Pierre Moreau wrote:
> Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the 
> GPU
> unhappy. Instead, manually handle the conversion between 64-bit and 32-bit
> values, and use `cvt` to convert between the original target (resp. 
> source)
> and 32-bit value. This happens to be the behaviour of NVIDIA's driver.
> 
> Signed-off-by: Pierre Moreau 
> ---
>   .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 59 
>  ++
>   .../nouveau/codegen/nv50_ir_lowering_nvc0.h|  1 +
>   2 files changed, 60 insertions(+)
> 
> diff --git 
> a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> index 2719f2c..c419a68 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i)
>  return true;
>   }
> 
> +bool
> +NVC0LoweringPass::handleCVT(Instruction *i)
> +{
> +   if (isFloatType(i->dType) || isFloatType(i->sType) ||
> + isSignedIntType(i->dType) xor isSignedIntType(i->sType))
> >>
> >>I know pre-C89 features are cool, but let's avoid using them. I know
> >>characters like ^ were uncommon on the 1960's and 1970's teletypes,
> >>but I think we're past those days now.
> >
> >Yeah… Will fix that.
> 
> So "xor" or "^" is bitwise not logical, since isSignedIntType() returns
> a bool, which when cast to an int is guaranteed to be 0 or 1, this
> should work fine.
> 
> And being a bitwise op its presedence means it will get evaluated
> before the "||" operators in your condition which I believe is what
> we want here, but can we please have a pair of parenthesis around the
> "^" and its operands to make this more clear ?

Sure, I’ll add a pair of parenthesis around it.

Regards,
Pierre

> 
> Regards,
> 
> Hans
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually

2016-04-18 Thread Hans de Goede

Hi,

On 17-04-16 22:27, Pierre Moreau wrote:

On 04:17 PM - Apr 17 2016, Ilia Mirkin wrote:

On Sun, Apr 17, 2016 at 4:07 PM, Pierre Moreau  wrote:

Ping :-)

On 10:56 PM - Mar 19 2016, Pierre Moreau wrote:

Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the GPU
unhappy. Instead, manually handle the conversion between 64-bit and 32-bit
values, and use `cvt` to convert between the original target (resp. source)
and 32-bit value. This happens to be the behaviour of NVIDIA's driver.

Signed-off-by: Pierre Moreau 
---
  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 59 ++
  .../nouveau/codegen/nv50_ir_lowering_nvc0.h|  1 +
  2 files changed, 60 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index 2719f2c..c419a68 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i)
 return true;
  }

+bool
+NVC0LoweringPass::handleCVT(Instruction *i)
+{
+   if (isFloatType(i->dType) || isFloatType(i->sType) ||
+ isSignedIntType(i->dType) xor isSignedIntType(i->sType))


I know pre-C89 features are cool, but let's avoid using them. I know
characters like ^ were uncommon on the 1960's and 1970's teletypes,
but I think we're past those days now.


Yeah… Will fix that.


So "xor" or "^" is bitwise not logical, since isSignedIntType() returns
a bool, which when cast to an int is guaranteed to be 0 or 1, this
should work fine.

And being a bitwise op its presedence means it will get evaluated
before the "||" operators in your condition which I believe is what
we want here, but can we please have a pair of parenthesis around the
"^" and its operands to make this more clear ?

Regards,

Hans
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually

2016-04-17 Thread Ilia Mirkin
On Sun, Apr 17, 2016 at 4:27 PM, Pierre Moreau  wrote:
> On 04:17 PM - Apr 17 2016, Ilia Mirkin wrote:
>> On Sun, Apr 17, 2016 at 4:07 PM, Pierre Moreau  wrote:
>> > Ping :-)
>> >
>> > On 10:56 PM - Mar 19 2016, Pierre Moreau wrote:
>> >> Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the 
>> >> GPU
>> >> unhappy. Instead, manually handle the conversion between 64-bit and 32-bit
>> >> values, and use `cvt` to convert between the original target (resp. 
>> >> source)
>> >> and 32-bit value. This happens to be the behaviour of NVIDIA's driver.
>> >>
>> >> Signed-off-by: Pierre Moreau 
>> >> ---
>> >>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 59 
>> >> ++
>> >>  .../nouveau/codegen/nv50_ir_lowering_nvc0.h|  1 +
>> >>  2 files changed, 60 insertions(+)
>> >>
>> >> diff --git 
>> >> a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
>> >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> >> index 2719f2c..c419a68 100644
>> >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> >> @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i)
>> >> return true;
>> >>  }
>> >>
>> >> +bool
>> >> +NVC0LoweringPass::handleCVT(Instruction *i)
>> >> +{
>> >> +   if (isFloatType(i->dType) || isFloatType(i->sType) ||
>> >> + isSignedIntType(i->dType) xor isSignedIntType(i->sType))
>>
>> I know pre-C89 features are cool, but let's avoid using them. I know
>> characters like ^ were uncommon on the 1960's and 1970's teletypes,
>> but I think we're past those days now.
>
> Yeah… Will fix that.
>
>>
>> >> +  return false;
>> >> +
>> >> +   if (typeSizeof(i->sType) == 8) {
>> >> +  Value *toSplit = i->getSrc(0);
>> >> +  if (i->saturate) {
>> >> + Value *minValue = bld.loadImm(bld.getSSA(8), 0ul);
>> >> + Value *maxValue = bld.loadImm(bld.getSSA(8), UINT32_MAX);
>> >> + if (isSignedType(i->sType)) {
>> >> +minValue = bld.loadImm(bld.getSSA(8), INT32_MIN);
>> >> +maxValue = bld.loadImm(bld.getSSA(8), INT32_MAX);
>> >> + }
>> >> + Value *minRes = bld.mkOp2v(OP_MAX, i->sType, bld.getSSA(8), 
>> >> toSplit,
>> >> +minValue);
>> >> + toSplit = bld.mkOp2v(OP_MIN, i->sType, bld.getSSA(8), minRes,
>> >> +  maxValue);
>>
>> Aren't you assuming that i->dType == 4 here? It could be an unsigned
>> <-> signed conversion, at 64-bit. So the clamp values would be
>
> I am assuming `i->dType <= 4`: remember the ^ from before! ;-) But, it
> could be a U64 <=> U64 or S64 <=> S64 conversion, which would then fail…

Oh I see. And the == 4 out is to avoid the extra cvt... which we don't
optimize out? I thought we should, but maybe not.

Some asserts could be in order, like

assert(typeSizeof(i->sType) != typeSizeof(i->dType))

or something like that above, to clarify what you mean.

  -ilia

>
>> different. Handling ALL the cases is quite annoying... can you figure
>> out what the hw doesn't support and just handle that? I doubt it'll be
>> any slower, and definitely simpler.
>
> I don’t remember if I checked U64 <=> S64 conversions… Will need to refresh my
> memory and log which combinations fail.
>
> Thanks!
> Pierre
>
>>
>>   -ilia
>>
>> >> +  }
>> >> +
>> >> +  Value *value32[2] = { bld.getSSA(), bld.getSSA() };
>> >> +  bld.mkSplit(value32, 4, toSplit);
>> >> +  if (typeSizeof(i->dType) == 4) {
>> >> + bld.mkMov(i->getDef(0), value32[0], i->dType);
>> >> + delete_Instruction(prog, i);
>> >> + return true;
>> >> +  }
>> >> +
>> >> +  i->setSrc(0, bld.getSSA());
>> >> +  i->sType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32;
>> >> +  bld.mkMov(i->getSrc(0), value32[0], i->sType);
>> >> +   } else if (typeSizeof(i->dType) == 8) {
>> >> +  bld.setPosition(i, true);
>> >> +  Value *res = i->getDef(0);
>> >> +  Value *high32 = bld.loadImm(bld.getSSA(),
>> >> +  isSignedType(i->sType) ? UINT32_MAX : 
>> >> 0u);
>> >> +  Value *low32 = i->getSrc(0);
>> >> +  DataType resType = i->dType;
>> >> +
>> >> +  if (typeSizeof(i->sType) <= 2) {
>> >> + i->dType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32;
>> >> + low32 = bld.getSSA();
>> >> + i->setDef(0, low32);
>> >> +  } else if (typeSizeof(i->sType) == 4) {
>> >> + delete_Instruction(prog, i);
>> >> +  }
>> >> +
>> >> +  Value *merged64 = bld.mkOp2v(OP_MERGE, resType, bld.getSSA(8), 
>> >> low32,
>> >> +   high32);
>> >> +  bld.mkMov(res, merged64, resType);
>> >> +   }
>> >> +
>> >> +   return true;
>> >> +}
>> >> +
>> >>  // Generate a binary predicate if an instruction is predicated by
>> 

Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually

2016-04-17 Thread Pierre Moreau
On 04:17 PM - Apr 17 2016, Ilia Mirkin wrote:
> On Sun, Apr 17, 2016 at 4:07 PM, Pierre Moreau  wrote:
> > Ping :-)
> >
> > On 10:56 PM - Mar 19 2016, Pierre Moreau wrote:
> >> Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the 
> >> GPU
> >> unhappy. Instead, manually handle the conversion between 64-bit and 32-bit
> >> values, and use `cvt` to convert between the original target (resp. source)
> >> and 32-bit value. This happens to be the behaviour of NVIDIA's driver.
> >>
> >> Signed-off-by: Pierre Moreau 
> >> ---
> >>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 59 
> >> ++
> >>  .../nouveau/codegen/nv50_ir_lowering_nvc0.h|  1 +
> >>  2 files changed, 60 insertions(+)
> >>
> >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
> >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> >> index 2719f2c..c419a68 100644
> >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> >> @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i)
> >> return true;
> >>  }
> >>
> >> +bool
> >> +NVC0LoweringPass::handleCVT(Instruction *i)
> >> +{
> >> +   if (isFloatType(i->dType) || isFloatType(i->sType) ||
> >> + isSignedIntType(i->dType) xor isSignedIntType(i->sType))
> 
> I know pre-C89 features are cool, but let's avoid using them. I know
> characters like ^ were uncommon on the 1960's and 1970's teletypes,
> but I think we're past those days now.

Yeah… Will fix that.

> 
> >> +  return false;
> >> +
> >> +   if (typeSizeof(i->sType) == 8) {
> >> +  Value *toSplit = i->getSrc(0);
> >> +  if (i->saturate) {
> >> + Value *minValue = bld.loadImm(bld.getSSA(8), 0ul);
> >> + Value *maxValue = bld.loadImm(bld.getSSA(8), UINT32_MAX);
> >> + if (isSignedType(i->sType)) {
> >> +minValue = bld.loadImm(bld.getSSA(8), INT32_MIN);
> >> +maxValue = bld.loadImm(bld.getSSA(8), INT32_MAX);
> >> + }
> >> + Value *minRes = bld.mkOp2v(OP_MAX, i->sType, bld.getSSA(8), 
> >> toSplit,
> >> +minValue);
> >> + toSplit = bld.mkOp2v(OP_MIN, i->sType, bld.getSSA(8), minRes,
> >> +  maxValue);
> 
> Aren't you assuming that i->dType == 4 here? It could be an unsigned
> <-> signed conversion, at 64-bit. So the clamp values would be

I am assuming `i->dType <= 4`: remember the ^ from before! ;-) But, it
could be a U64 <=> U64 or S64 <=> S64 conversion, which would then fail…

> different. Handling ALL the cases is quite annoying... can you figure
> out what the hw doesn't support and just handle that? I doubt it'll be
> any slower, and definitely simpler.

I don’t remember if I checked U64 <=> S64 conversions… Will need to refresh my
memory and log which combinations fail.

Thanks!
Pierre

> 
>   -ilia
> 
> >> +  }
> >> +
> >> +  Value *value32[2] = { bld.getSSA(), bld.getSSA() };
> >> +  bld.mkSplit(value32, 4, toSplit);
> >> +  if (typeSizeof(i->dType) == 4) {
> >> + bld.mkMov(i->getDef(0), value32[0], i->dType);
> >> + delete_Instruction(prog, i);
> >> + return true;
> >> +  }
> >> +
> >> +  i->setSrc(0, bld.getSSA());
> >> +  i->sType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32;
> >> +  bld.mkMov(i->getSrc(0), value32[0], i->sType);
> >> +   } else if (typeSizeof(i->dType) == 8) {
> >> +  bld.setPosition(i, true);
> >> +  Value *res = i->getDef(0);
> >> +  Value *high32 = bld.loadImm(bld.getSSA(),
> >> +  isSignedType(i->sType) ? UINT32_MAX : 
> >> 0u);
> >> +  Value *low32 = i->getSrc(0);
> >> +  DataType resType = i->dType;
> >> +
> >> +  if (typeSizeof(i->sType) <= 2) {
> >> + i->dType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32;
> >> + low32 = bld.getSSA();
> >> + i->setDef(0, low32);
> >> +  } else if (typeSizeof(i->sType) == 4) {
> >> + delete_Instruction(prog, i);
> >> +  }
> >> +
> >> +  Value *merged64 = bld.mkOp2v(OP_MERGE, resType, bld.getSSA(8), 
> >> low32,
> >> +   high32);
> >> +  bld.mkMov(res, merged64, resType);
> >> +   }
> >> +
> >> +   return true;
> >> +}
> >> +
> >>  // Generate a binary predicate if an instruction is predicated by
> >>  // e.g. an f32 value.
> >>  void
> >> @@ -1894,6 +1951,8 @@ NVC0LoweringPass::visit(Instruction *i)
> >>checkPredicate(i);
> >>
> >> switch (i->op) {
> >> +   case OP_CVT:
> >> +  return handleCVT(i);
> >> case OP_TEX:
> >> case OP_TXB:
> >> case OP_TXL:
> >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h 
> >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
> >> index 6eb8aff..9fc24d9 100644
> >> --- 

Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually

2016-04-17 Thread Ilia Mirkin
On Sun, Apr 17, 2016 at 4:07 PM, Pierre Moreau  wrote:
> Ping :-)
>
> On 10:56 PM - Mar 19 2016, Pierre Moreau wrote:
>> Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the GPU
>> unhappy. Instead, manually handle the conversion between 64-bit and 32-bit
>> values, and use `cvt` to convert between the original target (resp. source)
>> and 32-bit value. This happens to be the behaviour of NVIDIA's driver.
>>
>> Signed-off-by: Pierre Moreau 
>> ---
>>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 59 
>> ++
>>  .../nouveau/codegen/nv50_ir_lowering_nvc0.h|  1 +
>>  2 files changed, 60 insertions(+)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> index 2719f2c..c419a68 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i)
>> return true;
>>  }
>>
>> +bool
>> +NVC0LoweringPass::handleCVT(Instruction *i)
>> +{
>> +   if (isFloatType(i->dType) || isFloatType(i->sType) ||
>> + isSignedIntType(i->dType) xor isSignedIntType(i->sType))

I know pre-C89 features are cool, but let's avoid using them. I know
characters like ^ were uncommon on the 1960's and 1970's teletypes,
but I think we're past those days now.

>> +  return false;
>> +
>> +   if (typeSizeof(i->sType) == 8) {
>> +  Value *toSplit = i->getSrc(0);
>> +  if (i->saturate) {
>> + Value *minValue = bld.loadImm(bld.getSSA(8), 0ul);
>> + Value *maxValue = bld.loadImm(bld.getSSA(8), UINT32_MAX);
>> + if (isSignedType(i->sType)) {
>> +minValue = bld.loadImm(bld.getSSA(8), INT32_MIN);
>> +maxValue = bld.loadImm(bld.getSSA(8), INT32_MAX);
>> + }
>> + Value *minRes = bld.mkOp2v(OP_MAX, i->sType, bld.getSSA(8), 
>> toSplit,
>> +minValue);
>> + toSplit = bld.mkOp2v(OP_MIN, i->sType, bld.getSSA(8), minRes,
>> +  maxValue);

Aren't you assuming that i->dType == 4 here? It could be an unsigned
<-> signed conversion, at 64-bit. So the clamp values would be
different. Handling ALL the cases is quite annoying... can you figure
out what the hw doesn't support and just handle that? I doubt it'll be
any slower, and definitely simpler.

  -ilia

>> +  }
>> +
>> +  Value *value32[2] = { bld.getSSA(), bld.getSSA() };
>> +  bld.mkSplit(value32, 4, toSplit);
>> +  if (typeSizeof(i->dType) == 4) {
>> + bld.mkMov(i->getDef(0), value32[0], i->dType);
>> + delete_Instruction(prog, i);
>> + return true;
>> +  }
>> +
>> +  i->setSrc(0, bld.getSSA());
>> +  i->sType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32;
>> +  bld.mkMov(i->getSrc(0), value32[0], i->sType);
>> +   } else if (typeSizeof(i->dType) == 8) {
>> +  bld.setPosition(i, true);
>> +  Value *res = i->getDef(0);
>> +  Value *high32 = bld.loadImm(bld.getSSA(),
>> +  isSignedType(i->sType) ? UINT32_MAX : 0u);
>> +  Value *low32 = i->getSrc(0);
>> +  DataType resType = i->dType;
>> +
>> +  if (typeSizeof(i->sType) <= 2) {
>> + i->dType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32;
>> + low32 = bld.getSSA();
>> + i->setDef(0, low32);
>> +  } else if (typeSizeof(i->sType) == 4) {
>> + delete_Instruction(prog, i);
>> +  }
>> +
>> +  Value *merged64 = bld.mkOp2v(OP_MERGE, resType, bld.getSSA(8), low32,
>> +   high32);
>> +  bld.mkMov(res, merged64, resType);
>> +   }
>> +
>> +   return true;
>> +}
>> +
>>  // Generate a binary predicate if an instruction is predicated by
>>  // e.g. an f32 value.
>>  void
>> @@ -1894,6 +1951,8 @@ NVC0LoweringPass::visit(Instruction *i)
>>checkPredicate(i);
>>
>> switch (i->op) {
>> +   case OP_CVT:
>> +  return handleCVT(i);
>> case OP_TEX:
>> case OP_TXB:
>> case OP_TXL:
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h 
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
>> index 6eb8aff..9fc24d9 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
>> @@ -96,6 +96,7 @@ protected:
>> bool handleMOD(Instruction *);
>> bool handleSQRT(Instruction *);
>> bool handlePOW(Instruction *);
>> +   bool handleCVT(Instruction *);
>> bool handleTEX(TexInstruction *);
>> bool handleTXD(TexInstruction *);
>> bool handleTXQ(TexInstruction *);
>> --
>> 2.7.4
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> 

Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually

2016-04-17 Thread Pierre Moreau
Ping :-)

On 10:56 PM - Mar 19 2016, Pierre Moreau wrote:
> Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the GPU
> unhappy. Instead, manually handle the conversion between 64-bit and 32-bit
> values, and use `cvt` to convert between the original target (resp. source)
> and 32-bit value. This happens to be the behaviour of NVIDIA's driver.
> 
> Signed-off-by: Pierre Moreau 
> ---
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 59 
> ++
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.h|  1 +
>  2 files changed, 60 insertions(+)
> 
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> index 2719f2c..c419a68 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i)
> return true;
>  }
>  
> +bool
> +NVC0LoweringPass::handleCVT(Instruction *i)
> +{
> +   if (isFloatType(i->dType) || isFloatType(i->sType) ||
> + isSignedIntType(i->dType) xor isSignedIntType(i->sType))
> +  return false;
> +
> +   if (typeSizeof(i->sType) == 8) {
> +  Value *toSplit = i->getSrc(0);
> +  if (i->saturate) {
> + Value *minValue = bld.loadImm(bld.getSSA(8), 0ul);
> + Value *maxValue = bld.loadImm(bld.getSSA(8), UINT32_MAX);
> + if (isSignedType(i->sType)) {
> +minValue = bld.loadImm(bld.getSSA(8), INT32_MIN);
> +maxValue = bld.loadImm(bld.getSSA(8), INT32_MAX);
> + }
> + Value *minRes = bld.mkOp2v(OP_MAX, i->sType, bld.getSSA(8), toSplit,
> +minValue);
> + toSplit = bld.mkOp2v(OP_MIN, i->sType, bld.getSSA(8), minRes,
> +  maxValue);
> +  }
> +
> +  Value *value32[2] = { bld.getSSA(), bld.getSSA() };
> +  bld.mkSplit(value32, 4, toSplit);
> +  if (typeSizeof(i->dType) == 4) {
> + bld.mkMov(i->getDef(0), value32[0], i->dType);
> + delete_Instruction(prog, i);
> + return true;
> +  }
> +
> +  i->setSrc(0, bld.getSSA());
> +  i->sType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32;
> +  bld.mkMov(i->getSrc(0), value32[0], i->sType);
> +   } else if (typeSizeof(i->dType) == 8) {
> +  bld.setPosition(i, true);
> +  Value *res = i->getDef(0);
> +  Value *high32 = bld.loadImm(bld.getSSA(),
> +  isSignedType(i->sType) ? UINT32_MAX : 0u);
> +  Value *low32 = i->getSrc(0);
> +  DataType resType = i->dType;
> +
> +  if (typeSizeof(i->sType) <= 2) {
> + i->dType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32;
> + low32 = bld.getSSA();
> + i->setDef(0, low32);
> +  } else if (typeSizeof(i->sType) == 4) {
> + delete_Instruction(prog, i);
> +  }
> +
> +  Value *merged64 = bld.mkOp2v(OP_MERGE, resType, bld.getSSA(8), low32,
> +   high32);
> +  bld.mkMov(res, merged64, resType);
> +   }
> +
> +   return true;
> +}
> +
>  // Generate a binary predicate if an instruction is predicated by
>  // e.g. an f32 value.
>  void
> @@ -1894,6 +1951,8 @@ NVC0LoweringPass::visit(Instruction *i)
>checkPredicate(i);
>  
> switch (i->op) {
> +   case OP_CVT:
> +  return handleCVT(i);
> case OP_TEX:
> case OP_TXB:
> case OP_TXL:
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
> index 6eb8aff..9fc24d9 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
> @@ -96,6 +96,7 @@ protected:
> bool handleMOD(Instruction *);
> bool handleSQRT(Instruction *);
> bool handlePOW(Instruction *);
> +   bool handleCVT(Instruction *);
> bool handleTEX(TexInstruction *);
> bool handleTXD(TexInstruction *);
> bool handleTXQ(TexInstruction *);
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually

2016-03-19 Thread Pierre Moreau
On 06:41 PM - Mar 19 2016, Ilia Mirkin wrote:
> On Sat, Mar 19, 2016 at 6:36 PM, Pierre Moreau  wrote:
> > On 06:26 PM - Mar 19 2016, Ilia Mirkin wrote:
> >> On Sat, Mar 19, 2016 at 6:26 PM, Pierre Moreau  
> >> wrote:
> >> > However, you could have some `long bar; char foo = 
> >> > convert_char_sat(bar);` in
> >> > the OpenCL kernel.
> >>
> >> Sure, but the SPIR-V -> nv50/ir converter could be smarter about when
> >> it generates the converts, no?
> >
> > It should be possible, but then I'm unsure what ends up in the SPIR-V ->
> > nv50/ir converter and what ends up in the nv50/ir backend. Should I also do
> > constant folding in the converter? I was assuming the backend would take 
> > care
> > of the optimisations, removing useless converts, but maybe my assumptions 
> > were
> > wrong and I have to take care of more things than just translating from 
> > SPIR-V
> > to nv50/ir?
> 
> Well, the nv50 ir is not a well-specified IR. So it's kind of up to us
> what to do. However if the hw hates converts with src/dst types like
> that, I'm perfectly happy to decree that such converts shall never
> make it into the IR. That said, if you feel strongly about it, we can
> leave it as a fix-up pass. What about nv50, need the same logic there
> too right?

If such converts are not permitted in nv50/ir, then that means each converter
to nv50/ir will need to do the fixup themselves, resulting in, most likely,
duplicate code between them, as they will handle it more or less the same way.
(Well, there aren't many converters to nv50/ir, so not really a big issue
here.) Whereas if we have it as a fixup pass, we need the code only once, and
have it handled for all existing (and future) converters.
But you have way more experience with nv50/ir and compilers than I have, so,
your call.

Most likely, but I haven't tried it. I should probably have this code in an
earlier pass then, that is not family dependent.

Pierre

> 
>   -ilia
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually

2016-03-19 Thread Ilia Mirkin
On Sat, Mar 19, 2016 at 6:36 PM, Pierre Moreau  wrote:
> On 06:26 PM - Mar 19 2016, Ilia Mirkin wrote:
>> On Sat, Mar 19, 2016 at 6:26 PM, Pierre Moreau  wrote:
>> > However, you could have some `long bar; char foo = convert_char_sat(bar);` 
>> > in
>> > the OpenCL kernel.
>>
>> Sure, but the SPIR-V -> nv50/ir converter could be smarter about when
>> it generates the converts, no?
>
> It should be possible, but then I'm unsure what ends up in the SPIR-V ->
> nv50/ir converter and what ends up in the nv50/ir backend. Should I also do
> constant folding in the converter? I was assuming the backend would take care
> of the optimisations, removing useless converts, but maybe my assumptions were
> wrong and I have to take care of more things than just translating from SPIR-V
> to nv50/ir?

Well, the nv50 ir is not a well-specified IR. So it's kind of up to us
what to do. However if the hw hates converts with src/dst types like
that, I'm perfectly happy to decree that such converts shall never
make it into the IR. That said, if you feel strongly about it, we can
leave it as a fix-up pass. What about nv50, need the same logic there
too right?

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually

2016-03-19 Thread Pierre Moreau
On 06:26 PM - Mar 19 2016, Ilia Mirkin wrote:
> On Sat, Mar 19, 2016 at 6:26 PM, Pierre Moreau  wrote:
> > However, you could have some `long bar; char foo = convert_char_sat(bar);` 
> > in
> > the OpenCL kernel.
> 
> Sure, but the SPIR-V -> nv50/ir converter could be smarter about when
> it generates the converts, no?

It should be possible, but then I'm unsure what ends up in the SPIR-V ->
nv50/ir converter and what ends up in the nv50/ir backend. Should I also do
constant folding in the converter? I was assuming the backend would take care
of the optimisations, removing useless converts, but maybe my assumptions were
wrong and I have to take care of more things than just translating from SPIR-V
to nv50/ir?

Pierre

> 
>   -ilia
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually

2016-03-19 Thread Ilia Mirkin
On Sat, Mar 19, 2016 at 6:26 PM, Pierre Moreau  wrote:
> However, you could have some `long bar; char foo = convert_char_sat(bar);` in
> the OpenCL kernel.

Sure, but the SPIR-V -> nv50/ir converter could be smarter about when
it generates the converts, no?

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually

2016-03-19 Thread Pierre Moreau
On 06:06 PM - Mar 19 2016, Ilia Mirkin wrote:
> Where are these coming from? Could you perhaps not generate them in
> the first place?

Those are coming from the generated SPIR-V, of the following kernel for
example:

__kernel void global_id(__global int * out)
{
  unsigned id = get_global_id(0);
  out[id] = id;
}

But I don't see any reason why there should be cvt generated in this case. I'll
have to investigate the SPIR-V generation.

However, you could have some `long bar; char foo = convert_char_sat(bar);` in
the OpenCL kernel.

> 
> On Sat, Mar 19, 2016 at 5:56 PM, Pierre Moreau  wrote:
> > Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the 
> > GPU
> > unhappy. Instead, manually handle the conversion between 64-bit and 32-bit
> > values, and use `cvt` to convert between the original target (resp. source)
> > and 32-bit value. This happens to be the behaviour of NVIDIA's driver.
> >
> > Signed-off-by: Pierre Moreau 
> > ---
> >  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 59 
> > ++
> >  .../nouveau/codegen/nv50_ir_lowering_nvc0.h|  1 +
> >  2 files changed, 60 insertions(+)
> >
> > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
> > b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> > index 2719f2c..c419a68 100644
> > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> > @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i)
> > return true;
> >  }
> >
> > +bool
> > +NVC0LoweringPass::handleCVT(Instruction *i)
> > +{
> > +   if (isFloatType(i->dType) || isFloatType(i->sType) ||
> > + isSignedIntType(i->dType) xor isSignedIntType(i->sType))
> > +  return false;
> > +
> > +   if (typeSizeof(i->sType) == 8) {
> > +  Value *toSplit = i->getSrc(0);
> > +  if (i->saturate) {
> > + Value *minValue = bld.loadImm(bld.getSSA(8), 0ul);
> > + Value *maxValue = bld.loadImm(bld.getSSA(8), UINT32_MAX);
> > + if (isSignedType(i->sType)) {
> > +minValue = bld.loadImm(bld.getSSA(8), INT32_MIN);
> > +maxValue = bld.loadImm(bld.getSSA(8), INT32_MAX);
> > + }
> > + Value *minRes = bld.mkOp2v(OP_MAX, i->sType, bld.getSSA(8), 
> > toSplit,
> > +minValue);
> > + toSplit = bld.mkOp2v(OP_MIN, i->sType, bld.getSSA(8), minRes,
> > +  maxValue);
> > +  }
> > +
> > +  Value *value32[2] = { bld.getSSA(), bld.getSSA() };
> > +  bld.mkSplit(value32, 4, toSplit);
> > +  if (typeSizeof(i->dType) == 4) {
> > + bld.mkMov(i->getDef(0), value32[0], i->dType);
> > + delete_Instruction(prog, i);
> > + return true;
> > +  }
> > +
> > +  i->setSrc(0, bld.getSSA());
> > +  i->sType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32;
> > +  bld.mkMov(i->getSrc(0), value32[0], i->sType);
> > +   } else if (typeSizeof(i->dType) == 8) {
> > +  bld.setPosition(i, true);
> > +  Value *res = i->getDef(0);
> > +  Value *high32 = bld.loadImm(bld.getSSA(),
> > +  isSignedType(i->sType) ? UINT32_MAX : 
> > 0u);
> > +  Value *low32 = i->getSrc(0);
> > +  DataType resType = i->dType;
> > +
> > +  if (typeSizeof(i->sType) <= 2) {
> > + i->dType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32;
> > + low32 = bld.getSSA();
> > + i->setDef(0, low32);
> > +  } else if (typeSizeof(i->sType) == 4) {
> > + delete_Instruction(prog, i);
> > +  }
> > +
> > +  Value *merged64 = bld.mkOp2v(OP_MERGE, resType, bld.getSSA(8), low32,
> > +   high32);
> > +  bld.mkMov(res, merged64, resType);
> > +   }
> > +
> > +   return true;
> > +}
> > +
> >  // Generate a binary predicate if an instruction is predicated by
> >  // e.g. an f32 value.
> >  void
> > @@ -1894,6 +1951,8 @@ NVC0LoweringPass::visit(Instruction *i)
> >checkPredicate(i);
> >
> > switch (i->op) {
> > +   case OP_CVT:
> > +  return handleCVT(i);
> > case OP_TEX:
> > case OP_TXB:
> > case OP_TXL:
> > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h 
> > b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
> > index 6eb8aff..9fc24d9 100644
> > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
> > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
> > @@ -96,6 +96,7 @@ protected:
> > bool handleMOD(Instruction *);
> > bool handleSQRT(Instruction *);
> > bool handlePOW(Instruction *);
> > +   bool handleCVT(Instruction *);
> > bool handleTEX(TexInstruction *);
> > bool handleTXD(TexInstruction *);
> > bool handleTXQ(TexInstruction *);
> > --
> > 2.7.4
> >
> > 

Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually

2016-03-19 Thread Ilia Mirkin
Where are these coming from? Could you perhaps not generate them in
the first place?

On Sat, Mar 19, 2016 at 5:56 PM, Pierre Moreau  wrote:
> Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the GPU
> unhappy. Instead, manually handle the conversion between 64-bit and 32-bit
> values, and use `cvt` to convert between the original target (resp. source)
> and 32-bit value. This happens to be the behaviour of NVIDIA's driver.
>
> Signed-off-by: Pierre Moreau 
> ---
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 59 
> ++
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.h|  1 +
>  2 files changed, 60 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> index 2719f2c..c419a68 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i)
> return true;
>  }
>
> +bool
> +NVC0LoweringPass::handleCVT(Instruction *i)
> +{
> +   if (isFloatType(i->dType) || isFloatType(i->sType) ||
> + isSignedIntType(i->dType) xor isSignedIntType(i->sType))
> +  return false;
> +
> +   if (typeSizeof(i->sType) == 8) {
> +  Value *toSplit = i->getSrc(0);
> +  if (i->saturate) {
> + Value *minValue = bld.loadImm(bld.getSSA(8), 0ul);
> + Value *maxValue = bld.loadImm(bld.getSSA(8), UINT32_MAX);
> + if (isSignedType(i->sType)) {
> +minValue = bld.loadImm(bld.getSSA(8), INT32_MIN);
> +maxValue = bld.loadImm(bld.getSSA(8), INT32_MAX);
> + }
> + Value *minRes = bld.mkOp2v(OP_MAX, i->sType, bld.getSSA(8), toSplit,
> +minValue);
> + toSplit = bld.mkOp2v(OP_MIN, i->sType, bld.getSSA(8), minRes,
> +  maxValue);
> +  }
> +
> +  Value *value32[2] = { bld.getSSA(), bld.getSSA() };
> +  bld.mkSplit(value32, 4, toSplit);
> +  if (typeSizeof(i->dType) == 4) {
> + bld.mkMov(i->getDef(0), value32[0], i->dType);
> + delete_Instruction(prog, i);
> + return true;
> +  }
> +
> +  i->setSrc(0, bld.getSSA());
> +  i->sType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32;
> +  bld.mkMov(i->getSrc(0), value32[0], i->sType);
> +   } else if (typeSizeof(i->dType) == 8) {
> +  bld.setPosition(i, true);
> +  Value *res = i->getDef(0);
> +  Value *high32 = bld.loadImm(bld.getSSA(),
> +  isSignedType(i->sType) ? UINT32_MAX : 0u);
> +  Value *low32 = i->getSrc(0);
> +  DataType resType = i->dType;
> +
> +  if (typeSizeof(i->sType) <= 2) {
> + i->dType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32;
> + low32 = bld.getSSA();
> + i->setDef(0, low32);
> +  } else if (typeSizeof(i->sType) == 4) {
> + delete_Instruction(prog, i);
> +  }
> +
> +  Value *merged64 = bld.mkOp2v(OP_MERGE, resType, bld.getSSA(8), low32,
> +   high32);
> +  bld.mkMov(res, merged64, resType);
> +   }
> +
> +   return true;
> +}
> +
>  // Generate a binary predicate if an instruction is predicated by
>  // e.g. an f32 value.
>  void
> @@ -1894,6 +1951,8 @@ NVC0LoweringPass::visit(Instruction *i)
>checkPredicate(i);
>
> switch (i->op) {
> +   case OP_CVT:
> +  return handleCVT(i);
> case OP_TEX:
> case OP_TXB:
> case OP_TXL:
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
> index 6eb8aff..9fc24d9 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h
> @@ -96,6 +96,7 @@ protected:
> bool handleMOD(Instruction *);
> bool handleSQRT(Instruction *);
> bool handlePOW(Instruction *);
> +   bool handleCVT(Instruction *);
> bool handleTEX(TexInstruction *);
> bool handleTXD(TexInstruction *);
> bool handleTXQ(TexInstruction *);
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev