Re: [Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

2017-10-06 Thread Marek Olšák
On Fri, Oct 6, 2017 at 4:39 AM, Dave Airlie  wrote:
> On 6 October 2017 at 12:31, Marek Olšák  wrote:
>> On Fri, Oct 6, 2017 at 4:10 AM, Connor Abbott  wrote:
>>> On Thu, Oct 5, 2017 at 10:08 PM, Marek Olšák  wrote:
 On Fri, Oct 6, 2017 at 3:50 AM, Connor Abbott  wrote:
> Why? While it might technically be legal, always generating an unfused
> mul+add when the user explicitly requested fma() seems harsh...

 It's slow on some chips. It doesn't need any other reason.

 Marek
>>>
>>> Presumably, if the developer asked for fma, then they don't care how
>>> fast or slow it is...
>>
>> Feral asked for fma. They care. This debate is pointless. We just
>> won't use fma by default. Period.
>
> They didn't ask for it with precise precision. I'm assuming if someone wants
> fma with precise precision we should give it to them. Like at least
> the fma manpage states.
>
> https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/fma.xhtml

Oh please Dave, the page says the exact opposite of what you are
saying. The only thing the manpage says is: If fma and mul+add have
different precision, fma can't be split and mul+add can't be combined.
It doesn't say anything about precision of the result of fma itself.
Search for the word "can". It's not the same as "must".

That said, RADV can use as many slow opcodes as you want if you
insist. I'm only saying that the opcode selection of radeonsi is
non-negotiable on my side, and nir_to_llvm might get radeonsi-specific
opcode selection.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

2017-10-06 Thread Roland Scheidegger
Am 06.10.2017 um 11:29 schrieb Alex Smith:
> On 6 October 2017 at 03:39, Dave Airlie  > wrote:
> 
> On 6 October 2017 at 12:31, Marek Olšák  > wrote:
> > On Fri, Oct 6, 2017 at 4:10 AM, Connor Abbott  > wrote:
> >> On Thu, Oct 5, 2017 at 10:08 PM, Marek Olšák  > wrote:
> >>> On Fri, Oct 6, 2017 at 3:50 AM, Connor Abbott  > wrote:
>  Why? While it might technically be legal, always generating an 
> unfused
>  mul+add when the user explicitly requested fma() seems harsh...
> >>>
> >>> It's slow on some chips. It doesn't need any other reason.
> >>>
> >>> Marek
> >>
> >> Presumably, if the developer asked for fma, then they don't care how
> >> fast or slow it is...
> >
> > Feral asked for fma. They care. This debate is pointless. We just
> > won't use fma by default. Period.
> 
> They didn't ask for it with precise precision. I'm assuming if
> someone wants
> fma with precise precision we should give it to them. Like at least
> the fma manpage states.
> 
> https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/fma.xhtml
> 
> 
> 
> Some of our older games (e.g. Tomb Raider) do actually request precise
> (based on what the original D3D shader asks for), so changing the
> behaviour on GL to use the proper fma would likely regress performance
> on those.
> 
> D3D's mad (which we've been using fma to implement) is similarly vague
> as GLSL about what the actual precision requirements are with
> precise: 
> https://msdn.microsoft.com/en-us/library/windows/desktop/ff471418(v=vs.85).aspx
> 

Of course, but d3d mad is a "traditional" multiply/add which predates
fully programmable shader pipelines even, and back in the days gpus
actually used fixed function alus where talking about "fused" didn't
even make sense.
I think the problem here is just that glsl never had such a mad -
because being based on textual representation, mul and add use
operators, and a mad function just would look ugly (and generally with
glsl lax requirements, noone ever would care if you actually fuse muls
and adds).
But now with precise, you cannot fuse such separate muls and adds
freely, because the compiler can't guarantee you it will always fuse
them (and it would be shady in any case). Thus using separate muls and
adds would penalize gpus which can only do fused mul+add in a single
step (nvidia IIRC, also x86 avx with fma).
Hence "fma" being added.

I would, however, say that calling this "fma" is a very serious (but
unfixable now) spec bug. Noone ever talks about a "fused multiply add"
when it actually may as well be unfused. This is just confusing as hell.
Call it mad, fmuladd (as llvm does), mfma ("maybe fused"...) or
whatever, but not fma. (fwiw d3d is sane there - single may be fused or
unfused, and it's called mad, with doubles it is guaranteed to always be
fused, and it's called dfma accordingly.)
And fwiw I got confused by this too earlier, thinking it has to be fused
- certainly opencl etc. really want to use a fused one if they use fma.
This also means I was wrong earlier when there were some problems with
fma / mad on nouveau drivers - since fma can apparently be unfused,
there's no point for the mesa state tracker to ever use the tgsi fma
opcode, and it should always use MAD instead as far as I can tell (but
of course setting the precise bit accordingly).

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

2017-10-06 Thread Alex Smith
On 6 October 2017 at 03:39, Dave Airlie  wrote:

> On 6 October 2017 at 12:31, Marek Olšák  wrote:
> > On Fri, Oct 6, 2017 at 4:10 AM, Connor Abbott 
> wrote:
> >> On Thu, Oct 5, 2017 at 10:08 PM, Marek Olšák  wrote:
> >>> On Fri, Oct 6, 2017 at 3:50 AM, Connor Abbott 
> wrote:
>  Why? While it might technically be legal, always generating an unfused
>  mul+add when the user explicitly requested fma() seems harsh...
> >>>
> >>> It's slow on some chips. It doesn't need any other reason.
> >>>
> >>> Marek
> >>
> >> Presumably, if the developer asked for fma, then they don't care how
> >> fast or slow it is...
> >
> > Feral asked for fma. They care. This debate is pointless. We just
> > won't use fma by default. Period.
>
> They didn't ask for it with precise precision. I'm assuming if someone
> wants
> fma with precise precision we should give it to them. Like at least
> the fma manpage states.
>
> https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/fma.xhtml


Some of our older games (e.g. Tomb Raider) do actually request precise
(based on what the original D3D shader asks for), so changing the behaviour
on GL to use the proper fma would likely regress performance on those.

D3D's mad (which we've been using fma to implement) is similarly vague as
GLSL about what the actual precision requirements are with precise:
https://msdn.microsoft.com/en-us/library/windows/desktop/ff471418(v=vs.85).aspx

"If shader authors use the mad instrinsic to calculate a result that the
shader marked as precise, they indicate to the hardware to use any valid
implementation of the mad instruction (fused or not) as long as the
implementation is consistent for all uses of that mad intrinsic in any
shader on that hardware"

>From some quick testing I just did it looks like the AMD D3D driver always
implements mad as v_mac_f32 regardless of whether precise is requested.

So seems like (at least from our perspective!) it's not really an issue to
not actually get a fused op, and clearly hasn't been an issue since
radeonsi never gives you fused right now.

Alex


>
>
> Dave.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

2017-10-05 Thread Jason Ekstrand
On Thu, Oct 5, 2017 at 7:39 PM, Dave Airlie  wrote:

> On 6 October 2017 at 12:31, Marek Olšák  wrote:
> > On Fri, Oct 6, 2017 at 4:10 AM, Connor Abbott 
> wrote:
> >> On Thu, Oct 5, 2017 at 10:08 PM, Marek Olšák  wrote:
> >>> On Fri, Oct 6, 2017 at 3:50 AM, Connor Abbott 
> wrote:
>  Why? While it might technically be legal, always generating an unfused
>  mul+add when the user explicitly requested fma() seems harsh...
> >>>
> >>> It's slow on some chips. It doesn't need any other reason.
> >>>
> >>> Marek
> >>
> >> Presumably, if the developer asked for fma, then they don't care how
> >> fast or slow it is...
> >
> > Feral asked for fma. They care. This debate is pointless. We just
> > won't use fma by default. Period.
>
> They didn't ask for it with precise precision. I'm assuming if someone
> wants
> fma with precise precision we should give it to them. Like at least
> the fma manpage states.
>

Eh, fma() doesn't guarantee additional precision so anyone who's counting
on that is in for some trouble.  If someone uses fma() explicitly its
because they either care about speed (in which case fma on radeon doesn't
make sense from what Marek says) or because they want to explicitly control
the order of operations.  Giving them the slow thing just because the GPU
has that instruction is *not* what they want.

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

2017-10-05 Thread Dave Airlie
On 6 October 2017 at 12:31, Marek Olšák  wrote:
> On Fri, Oct 6, 2017 at 4:10 AM, Connor Abbott  wrote:
>> On Thu, Oct 5, 2017 at 10:08 PM, Marek Olšák  wrote:
>>> On Fri, Oct 6, 2017 at 3:50 AM, Connor Abbott  wrote:
 Why? While it might technically be legal, always generating an unfused
 mul+add when the user explicitly requested fma() seems harsh...
>>>
>>> It's slow on some chips. It doesn't need any other reason.
>>>
>>> Marek
>>
>> Presumably, if the developer asked for fma, then they don't care how
>> fast or slow it is...
>
> Feral asked for fma. They care. This debate is pointless. We just
> won't use fma by default. Period.

They didn't ask for it with precise precision. I'm assuming if someone wants
fma with precise precision we should give it to them. Like at least
the fma manpage states.

https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/fma.xhtml

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

2017-10-05 Thread Marek Olšák
On Fri, Oct 6, 2017 at 4:10 AM, Connor Abbott  wrote:
> On Thu, Oct 5, 2017 at 10:08 PM, Marek Olšák  wrote:
>> On Fri, Oct 6, 2017 at 3:50 AM, Connor Abbott  wrote:
>>> Why? While it might technically be legal, always generating an unfused
>>> mul+add when the user explicitly requested fma() seems harsh...
>>
>> It's slow on some chips. It doesn't need any other reason.
>>
>> Marek
>
> Presumably, if the developer asked for fma, then they don't care how
> fast or slow it is...

Feral asked for fma. They care. This debate is pointless. We just
won't use fma by default. Period.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

2017-10-05 Thread Connor Abbott
On Thu, Oct 5, 2017 at 10:08 PM, Marek Olšák  wrote:
> On Fri, Oct 6, 2017 at 3:50 AM, Connor Abbott  wrote:
>> Why? While it might technically be legal, always generating an unfused
>> mul+add when the user explicitly requested fma() seems harsh...
>
> It's slow on some chips. It doesn't need any other reason.
>
> Marek

Presumably, if the developer asked for fma, then they don't care how
fast or slow it is...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

2017-10-05 Thread Marek Olšák
On Fri, Oct 6, 2017 at 3:50 AM, Connor Abbott  wrote:
> Why? While it might technically be legal, always generating an unfused
> mul+add when the user explicitly requested fma() seems harsh...

It's slow on some chips. It doesn't need any other reason.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

2017-10-05 Thread Connor Abbott
Why? While it might technically be legal, always generating an unfused
mul+add when the user explicitly requested fma() seems harsh...

On Thu, Oct 5, 2017 at 9:32 PM, Marek Olšák  wrote:
> FYI, if we switch radeonsi to NIR, we are going to disable fma
> completely, exact or not.
>
> Marek
>
> On Wed, Oct 4, 2017 at 10:04 PM, Dave Airlie  wrote:
>> From: Dave Airlie 
>>
>> As pointed out by Connor we still need to use fma if nir wants
>> exact (precise) behaviour.
>>
>> Signed-off-by: Dave Airlie 
>> ---
>>  src/amd/common/ac_nir_to_llvm.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/amd/common/ac_nir_to_llvm.c 
>> b/src/amd/common/ac_nir_to_llvm.c
>> index 11ba487..38a2bbe 100644
>> --- a/src/amd/common/ac_nir_to_llvm.c
>> +++ b/src/amd/common/ac_nir_to_llvm.c
>> @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, 
>> const nir_alu_instr *instr)
>>   result);
>> break;
>> case nir_op_ffma:
>> -   result = emit_intrin_3f_param(>ac, "llvm.fmuladd",
>> +   result = emit_intrin_3f_param(>ac, instr->exact ? 
>> "llvm.fma" : "llvm.fmuladd",
>>   ac_to_float_type(>ac, 
>> def_type), src[0], src[1], src[2]);
>> break;
>> case nir_op_ibitfield_extract:
>> --
>> 2.9.5
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

2017-10-05 Thread Marek Olšák
FYI, if we switch radeonsi to NIR, we are going to disable fma
completely, exact or not.

Marek

On Wed, Oct 4, 2017 at 10:04 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> As pointed out by Connor we still need to use fma if nir wants
> exact (precise) behaviour.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/common/ac_nir_to_llvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 11ba487..38a2bbe 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, const 
> nir_alu_instr *instr)
>   result);
> break;
> case nir_op_ffma:
> -   result = emit_intrin_3f_param(>ac, "llvm.fmuladd",
> +   result = emit_intrin_3f_param(>ac, instr->exact ? 
> "llvm.fma" : "llvm.fmuladd",
>   ac_to_float_type(>ac, 
> def_type), src[0], src[1], src[2]);
> break;
> case nir_op_ibitfield_extract:
> --
> 2.9.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

2017-10-04 Thread Connor Abbott
Reviewed-by: Connor Abbott 

On Wed, Oct 4, 2017 at 4:04 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> As pointed out by Connor we still need to use fma if nir wants
> exact (precise) behaviour.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/common/ac_nir_to_llvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 11ba487..38a2bbe 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, const 
> nir_alu_instr *instr)
>   result);
> break;
> case nir_op_ffma:
> -   result = emit_intrin_3f_param(>ac, "llvm.fmuladd",
> +   result = emit_intrin_3f_param(>ac, instr->exact ? 
> "llvm.fma" : "llvm.fmuladd",
>   ac_to_float_type(>ac, 
> def_type), src[0], src[1], src[2]);
> break;
> case nir_op_ibitfield_extract:
> --
> 2.9.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

2017-10-04 Thread Dave Airlie
From: Dave Airlie 

As pointed out by Connor we still need to use fma if nir wants
exact (precise) behaviour.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 11ba487..38a2bbe 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, const 
nir_alu_instr *instr)
  result);
break;
case nir_op_ffma:
-   result = emit_intrin_3f_param(>ac, "llvm.fmuladd",
+   result = emit_intrin_3f_param(>ac, instr->exact ? 
"llvm.fma" : "llvm.fmuladd",
  ac_to_float_type(>ac, 
def_type), src[0], src[1], src[2]);
break;
case nir_op_ibitfield_extract:
-- 
2.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev