Re: [Mesa-dev] [PATCH 01/17] nir: introduce new convergent and cross-thread attributes

2017-07-06 Thread Connor Abbott
On Mon, Jun 12, 2017 at 9:26 PM, Jason Ekstrand  wrote:
> On Mon, Jun 12, 2017 at 7:38 PM, Connor Abbott  wrote:
>>
>> On Mon, Jun 12, 2017 at 7:19 PM, Jason Ekstrand 
>> wrote:
>> > On Mon, Jun 12, 2017 at 11:58 AM, Nicolai Hähnle 
>> > wrote:
>> >>
>> >> On 12.06.2017 20:50, Connor Abbott wrote:
>> >>>
>> >>> On Mon, Jun 12, 2017 at 2:17 AM, Nicolai Hähnle 
>> >>> wrote:
>> 
>>  On 10.06.2017 01:44, Connor Abbott wrote:
>> >
>> >
>> > From: Connor Abbott 
>> >
>> > These are properties of the instruction that must be respected when
>> > moving it around, in addition to the usual SSA dominance guarantee.
>> > Previously, we only had special handling for fddx and fddy, in a
>> > very
>> > ad-hoc way. But with arb_shader_ballot and arb_shader_group_vote,
>> > we'll
>> > have to start handling a lot more instructions with similar
>> > constraints,
>> > so we want to add a more formal model of what the optimizer can and
>> > cannot do.
>> >
>> > v2: don't add attribute for ALU instructions
>> > v3: special-case derivative ALU instructions
>> > Signed-off-by: Connor Abbott 
>> > ---
>> >src/compiler/nir/nir.h | 80
>> > ++
>> >1 file changed, 80 insertions(+)
>> >
>> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> > index 3b827bf..64caccb 100644
>> > --- a/src/compiler/nir/nir.h
>> > +++ b/src/compiler/nir/nir.h
>> > @@ -985,6 +985,25 @@ typedef enum {
>> >* intrinsic are due to the register reads/writes.
>> >*/
>> >   NIR_INTRINSIC_CAN_REORDER = (1 << 1),
>> > +
>> > +   /**
>> > +* Indicates whether this intrinsic is "cross-thread". An
>> > operation
>> > is
>> > +* cross-thread if results in one thread depend on inputs in
>> > another
>> > thread,
>> > +* and therefore optimizations cannot change the execution mask
>> > when
>> > the
>> > +* operation is called. Examples of cross-thread operations
>> > include
>> > +* screen-space derivatives, the "any" reduction which returns
>> > "true"
>> > in
>> > +* all threads if any thread inputs "true", etc.
>> > +*/
>> > +   NIR_INTRINSIC_CROSS_THREAD,
>> > +
>> > +   /**
>> > +* Indicates that this intrinsic is "convergent". An operation
>> > is
>> > +* convergent when it must always be called in convergent
>> > control
>> > flow,
>> > +* that is, control flow with the same execution mask as when
>> > the
>> > program
>> > +* started. If an operation is convergent, it must be
>> > cross-thread
>> > as
>> > well,
>> > +* since the optimizer must maintain the guarantee.
>> > +*/
>> > +   NIR_INTRINSIC_CONVERGENT,
>> 
>> 
>> 
>>  This is inconsistent with LLVM's definition of 'convergent', and I'd
>>  like
>>  you to change it to match up with LLVM.
>> 
>>  LLVM's definition of convergent is: "The operation must not be made
>>  control-dependent on additional values."
>> 
>>  In the language of execution masks, this means that optimizations
>>  must
>>  guarantee that the execution mask for the instruction can only become
>>  a
>>  superset of what it was originally. This means lifting is actually
>>  okay.
>> 
>>  This is relevant because e.g. texture instructions with implicit
>>  derivatives
>>  are actually convergent operations (in the LLVM sense), but obviously
>>  they
>>  can be called with exec masks that are subsets of the exec mask at
>>  program
>>  start.
>> >>>
>> >>>
>> >>> Actually, according to GLSL (and I think SPIR-V, although I'm not 100%
>> >>> sure), they can't be called that way -- results are undefined if
>> >>> derivatives (or textures that take implicit derivatives) aren't called
>> >>> in uniform control flow, full stop. That's why I changed the
>> >>> definition compared to LLVM - this definition of convergent allows all
>> >>> the optimizations that the LLVM definition does, but it opens up
>> >>> additional optimization opportunities since we can assume that control
>> >>> flow is always uniform when doing divergence analysis. Also, as-is,
>> >>> the definition matches the GLSL/SPIR-V semantics closely, and since
>> >>> the purpose of the convergent attribute is to model derivatives in
>> >>> GLSL and SPIR-V, I'd like to keep that. If GLSL or SPIR-V change their
>> >>> semantics to allow what you describe, then we can add something
>> >>> something closer to the LLVM convergent semantics. If you want me to
>> >>> change the name to avoid confusion with LLVM, that's fair though --
>> 

Re: [Mesa-dev] [PATCH 01/17] nir: introduce new convergent and cross-thread attributes

2017-06-12 Thread Jason Ekstrand
On Mon, Jun 12, 2017 at 7:38 PM, Connor Abbott  wrote:

> On Mon, Jun 12, 2017 at 7:19 PM, Jason Ekstrand 
> wrote:
> > On Mon, Jun 12, 2017 at 11:58 AM, Nicolai Hähnle 
> wrote:
> >>
> >> On 12.06.2017 20:50, Connor Abbott wrote:
> >>>
> >>> On Mon, Jun 12, 2017 at 2:17 AM, Nicolai Hähnle 
> >>> wrote:
> 
>  On 10.06.2017 01:44, Connor Abbott wrote:
> >
> >
> > From: Connor Abbott 
> >
> > These are properties of the instruction that must be respected when
> > moving it around, in addition to the usual SSA dominance guarantee.
> > Previously, we only had special handling for fddx and fddy, in a very
> > ad-hoc way. But with arb_shader_ballot and arb_shader_group_vote,
> we'll
> > have to start handling a lot more instructions with similar
> > constraints,
> > so we want to add a more formal model of what the optimizer can and
> > cannot do.
> >
> > v2: don't add attribute for ALU instructions
> > v3: special-case derivative ALU instructions
> > Signed-off-by: Connor Abbott 
> > ---
> >src/compiler/nir/nir.h | 80
> > ++
> >1 file changed, 80 insertions(+)
> >
> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> > index 3b827bf..64caccb 100644
> > --- a/src/compiler/nir/nir.h
> > +++ b/src/compiler/nir/nir.h
> > @@ -985,6 +985,25 @@ typedef enum {
> >* intrinsic are due to the register reads/writes.
> >*/
> >   NIR_INTRINSIC_CAN_REORDER = (1 << 1),
> > +
> > +   /**
> > +* Indicates whether this intrinsic is "cross-thread". An
> operation
> > is
> > +* cross-thread if results in one thread depend on inputs in
> > another
> > thread,
> > +* and therefore optimizations cannot change the execution mask
> > when
> > the
> > +* operation is called. Examples of cross-thread operations
> include
> > +* screen-space derivatives, the "any" reduction which returns
> > "true"
> > in
> > +* all threads if any thread inputs "true", etc.
> > +*/
> > +   NIR_INTRINSIC_CROSS_THREAD,
> > +
> > +   /**
> > +* Indicates that this intrinsic is "convergent". An operation is
> > +* convergent when it must always be called in convergent control
> > flow,
> > +* that is, control flow with the same execution mask as when the
> > program
> > +* started. If an operation is convergent, it must be
> cross-thread
> > as
> > well,
> > +* since the optimizer must maintain the guarantee.
> > +*/
> > +   NIR_INTRINSIC_CONVERGENT,
> 
> 
> 
>  This is inconsistent with LLVM's definition of 'convergent', and I'd
>  like
>  you to change it to match up with LLVM.
> 
>  LLVM's definition of convergent is: "The operation must not be made
>  control-dependent on additional values."
> 
>  In the language of execution masks, this means that optimizations must
>  guarantee that the execution mask for the instruction can only become
> a
>  superset of what it was originally. This means lifting is actually
> okay.
> 
>  This is relevant because e.g. texture instructions with implicit
>  derivatives
>  are actually convergent operations (in the LLVM sense), but obviously
>  they
>  can be called with exec masks that are subsets of the exec mask at
>  program
>  start.
> >>>
> >>>
> >>> Actually, according to GLSL (and I think SPIR-V, although I'm not 100%
> >>> sure), they can't be called that way -- results are undefined if
> >>> derivatives (or textures that take implicit derivatives) aren't called
> >>> in uniform control flow, full stop. That's why I changed the
> >>> definition compared to LLVM - this definition of convergent allows all
> >>> the optimizations that the LLVM definition does, but it opens up
> >>> additional optimization opportunities since we can assume that control
> >>> flow is always uniform when doing divergence analysis. Also, as-is,
> >>> the definition matches the GLSL/SPIR-V semantics closely, and since
> >>> the purpose of the convergent attribute is to model derivatives in
> >>> GLSL and SPIR-V, I'd like to keep that. If GLSL or SPIR-V change their
> >>> semantics to allow what you describe, then we can add something
> >>> something closer to the LLVM convergent semantics. If you want me to
> >>> change the name to avoid confusion with LLVM, that's fair though --
> >>> suggestions welcome on what to call it ;)
> >>
> >>
> >> Okay, I'm convinced that it makes sense to have these semantics, but a
> >> different name would be good.
> >
> >
> > I'm not quite so convinced. :-)  The LLVM definition seems, at first
> brush,
> > more powerful 

Re: [Mesa-dev] [PATCH 01/17] nir: introduce new convergent and cross-thread attributes

2017-06-12 Thread Connor Abbott
On Mon, Jun 12, 2017 at 7:19 PM, Jason Ekstrand  wrote:
> On Mon, Jun 12, 2017 at 11:58 AM, Nicolai Hähnle  wrote:
>>
>> On 12.06.2017 20:50, Connor Abbott wrote:
>>>
>>> On Mon, Jun 12, 2017 at 2:17 AM, Nicolai Hähnle 
>>> wrote:

 On 10.06.2017 01:44, Connor Abbott wrote:
>
>
> From: Connor Abbott 
>
> These are properties of the instruction that must be respected when
> moving it around, in addition to the usual SSA dominance guarantee.
> Previously, we only had special handling for fddx and fddy, in a very
> ad-hoc way. But with arb_shader_ballot and arb_shader_group_vote, we'll
> have to start handling a lot more instructions with similar
> constraints,
> so we want to add a more formal model of what the optimizer can and
> cannot do.
>
> v2: don't add attribute for ALU instructions
> v3: special-case derivative ALU instructions
> Signed-off-by: Connor Abbott 
> ---
>src/compiler/nir/nir.h | 80
> ++
>1 file changed, 80 insertions(+)
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 3b827bf..64caccb 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -985,6 +985,25 @@ typedef enum {
>* intrinsic are due to the register reads/writes.
>*/
>   NIR_INTRINSIC_CAN_REORDER = (1 << 1),
> +
> +   /**
> +* Indicates whether this intrinsic is "cross-thread". An operation
> is
> +* cross-thread if results in one thread depend on inputs in
> another
> thread,
> +* and therefore optimizations cannot change the execution mask
> when
> the
> +* operation is called. Examples of cross-thread operations include
> +* screen-space derivatives, the "any" reduction which returns
> "true"
> in
> +* all threads if any thread inputs "true", etc.
> +*/
> +   NIR_INTRINSIC_CROSS_THREAD,
> +
> +   /**
> +* Indicates that this intrinsic is "convergent". An operation is
> +* convergent when it must always be called in convergent control
> flow,
> +* that is, control flow with the same execution mask as when the
> program
> +* started. If an operation is convergent, it must be cross-thread
> as
> well,
> +* since the optimizer must maintain the guarantee.
> +*/
> +   NIR_INTRINSIC_CONVERGENT,



 This is inconsistent with LLVM's definition of 'convergent', and I'd
 like
 you to change it to match up with LLVM.

 LLVM's definition of convergent is: "The operation must not be made
 control-dependent on additional values."

 In the language of execution masks, this means that optimizations must
 guarantee that the execution mask for the instruction can only become a
 superset of what it was originally. This means lifting is actually okay.

 This is relevant because e.g. texture instructions with implicit
 derivatives
 are actually convergent operations (in the LLVM sense), but obviously
 they
 can be called with exec masks that are subsets of the exec mask at
 program
 start.
>>>
>>>
>>> Actually, according to GLSL (and I think SPIR-V, although I'm not 100%
>>> sure), they can't be called that way -- results are undefined if
>>> derivatives (or textures that take implicit derivatives) aren't called
>>> in uniform control flow, full stop. That's why I changed the
>>> definition compared to LLVM - this definition of convergent allows all
>>> the optimizations that the LLVM definition does, but it opens up
>>> additional optimization opportunities since we can assume that control
>>> flow is always uniform when doing divergence analysis. Also, as-is,
>>> the definition matches the GLSL/SPIR-V semantics closely, and since
>>> the purpose of the convergent attribute is to model derivatives in
>>> GLSL and SPIR-V, I'd like to keep that. If GLSL or SPIR-V change their
>>> semantics to allow what you describe, then we can add something
>>> something closer to the LLVM convergent semantics. If you want me to
>>> change the name to avoid confusion with LLVM, that's fair though --
>>> suggestions welcome on what to call it ;)
>>
>>
>> Okay, I'm convinced that it makes sense to have these semantics, but a
>> different name would be good.
>
>
> I'm not quite so convinced. :-)  The LLVM definition seems, at first brush,
> more powerful than the proposed definition and I think it's actually what
> you want for most optimizations.

The LLVM definition is actually less powerful than my definition - see
my response to Nicolai in patch 3.

> The only advantage I can see to the strict
> uniform definition is that it would let us imply information about
> 

Re: [Mesa-dev] [PATCH 01/17] nir: introduce new convergent and cross-thread attributes

2017-06-12 Thread Jason Ekstrand
On Mon, Jun 12, 2017 at 11:58 AM, Nicolai Hähnle  wrote:

> On 12.06.2017 20:50, Connor Abbott wrote:
>
>> On Mon, Jun 12, 2017 at 2:17 AM, Nicolai Hähnle 
>> wrote:
>>
>>> On 10.06.2017 01:44, Connor Abbott wrote:
>>>

 From: Connor Abbott 

 These are properties of the instruction that must be respected when
 moving it around, in addition to the usual SSA dominance guarantee.
 Previously, we only had special handling for fddx and fddy, in a very
 ad-hoc way. But with arb_shader_ballot and arb_shader_group_vote, we'll
 have to start handling a lot more instructions with similar constraints,
 so we want to add a more formal model of what the optimizer can and
 cannot do.

 v2: don't add attribute for ALU instructions
 v3: special-case derivative ALU instructions
 Signed-off-by: Connor Abbott 
 ---
src/compiler/nir/nir.h | 80
 ++
1 file changed, 80 insertions(+)

 diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
 index 3b827bf..64caccb 100644
 --- a/src/compiler/nir/nir.h
 +++ b/src/compiler/nir/nir.h
 @@ -985,6 +985,25 @@ typedef enum {
* intrinsic are due to the register reads/writes.
*/
   NIR_INTRINSIC_CAN_REORDER = (1 << 1),
 +
 +   /**
 +* Indicates whether this intrinsic is "cross-thread". An operation
 is
 +* cross-thread if results in one thread depend on inputs in another
 thread,
 +* and therefore optimizations cannot change the execution mask when
 the
 +* operation is called. Examples of cross-thread operations include
 +* screen-space derivatives, the "any" reduction which returns
 "true"
 in
 +* all threads if any thread inputs "true", etc.
 +*/
 +   NIR_INTRINSIC_CROSS_THREAD,
 +
 +   /**
 +* Indicates that this intrinsic is "convergent". An operation is
 +* convergent when it must always be called in convergent control
 flow,
 +* that is, control flow with the same execution mask as when the
 program
 +* started. If an operation is convergent, it must be cross-thread
 as
 well,
 +* since the optimizer must maintain the guarantee.
 +*/
 +   NIR_INTRINSIC_CONVERGENT,

>>>
>>>
>>> This is inconsistent with LLVM's definition of 'convergent', and I'd like
>>> you to change it to match up with LLVM.
>>>
>>> LLVM's definition of convergent is: "The operation must not be made
>>> control-dependent on additional values."
>>>
>>> In the language of execution masks, this means that optimizations must
>>> guarantee that the execution mask for the instruction can only become a
>>> superset of what it was originally. This means lifting is actually okay.
>>>
>>> This is relevant because e.g. texture instructions with implicit
>>> derivatives
>>> are actually convergent operations (in the LLVM sense), but obviously
>>> they
>>> can be called with exec masks that are subsets of the exec mask at
>>> program
>>> start.
>>>
>>
>> Actually, according to GLSL (and I think SPIR-V, although I'm not 100%
>> sure), they can't be called that way -- results are undefined if
>> derivatives (or textures that take implicit derivatives) aren't called
>> in uniform control flow, full stop. That's why I changed the
>> definition compared to LLVM - this definition of convergent allows all
>> the optimizations that the LLVM definition does, but it opens up
>> additional optimization opportunities since we can assume that control
>> flow is always uniform when doing divergence analysis. Also, as-is,
>> the definition matches the GLSL/SPIR-V semantics closely, and since
>> the purpose of the convergent attribute is to model derivatives in
>> GLSL and SPIR-V, I'd like to keep that. If GLSL or SPIR-V change their
>> semantics to allow what you describe, then we can add something
>> something closer to the LLVM convergent semantics. If you want me to
>> change the name to avoid confusion with LLVM, that's fair though --
>> suggestions welcome on what to call it ;)
>>
>
> Okay, I'm convinced that it makes sense to have these semantics, but a
> different name would be good.
>

I'm not quite so convinced. :-)  The LLVM definition seems, at first brush,
more powerful than the proposed definition and I think it's actually what
you want for most optimizations.  The only advantage I can see to the
strict uniform definition is that it would let us imply information about
control-flow uniformity from instructions.  However, while probably
technically correct, that sounds like a dangerous path to go down.  What
specific optimizations were you thinking this stricter definition would
enable?


> How about NIR_INTRINSIC_UNIFORM_CONTROL?
>

That works.

--Jason


> Cheers,
> Nicolai
>
>
>
>
>
>>
>>> LLVM 

Re: [Mesa-dev] [PATCH 01/17] nir: introduce new convergent and cross-thread attributes

2017-06-12 Thread Nicolai Hähnle

On 12.06.2017 20:50, Connor Abbott wrote:

On Mon, Jun 12, 2017 at 2:17 AM, Nicolai Hähnle  wrote:

On 10.06.2017 01:44, Connor Abbott wrote:


From: Connor Abbott 

These are properties of the instruction that must be respected when
moving it around, in addition to the usual SSA dominance guarantee.
Previously, we only had special handling for fddx and fddy, in a very
ad-hoc way. But with arb_shader_ballot and arb_shader_group_vote, we'll
have to start handling a lot more instructions with similar constraints,
so we want to add a more formal model of what the optimizer can and
cannot do.

v2: don't add attribute for ALU instructions
v3: special-case derivative ALU instructions
Signed-off-by: Connor Abbott 
---
   src/compiler/nir/nir.h | 80
++
   1 file changed, 80 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 3b827bf..64caccb 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -985,6 +985,25 @@ typedef enum {
   * intrinsic are due to the register reads/writes.
   */
  NIR_INTRINSIC_CAN_REORDER = (1 << 1),
+
+   /**
+* Indicates whether this intrinsic is "cross-thread". An operation is
+* cross-thread if results in one thread depend on inputs in another
thread,
+* and therefore optimizations cannot change the execution mask when
the
+* operation is called. Examples of cross-thread operations include
+* screen-space derivatives, the "any" reduction which returns "true"
in
+* all threads if any thread inputs "true", etc.
+*/
+   NIR_INTRINSIC_CROSS_THREAD,
+
+   /**
+* Indicates that this intrinsic is "convergent". An operation is
+* convergent when it must always be called in convergent control
flow,
+* that is, control flow with the same execution mask as when the
program
+* started. If an operation is convergent, it must be cross-thread as
well,
+* since the optimizer must maintain the guarantee.
+*/
+   NIR_INTRINSIC_CONVERGENT,



This is inconsistent with LLVM's definition of 'convergent', and I'd like
you to change it to match up with LLVM.

LLVM's definition of convergent is: "The operation must not be made
control-dependent on additional values."

In the language of execution masks, this means that optimizations must
guarantee that the execution mask for the instruction can only become a
superset of what it was originally. This means lifting is actually okay.

This is relevant because e.g. texture instructions with implicit derivatives
are actually convergent operations (in the LLVM sense), but obviously they
can be called with exec masks that are subsets of the exec mask at program
start.


Actually, according to GLSL (and I think SPIR-V, although I'm not 100%
sure), they can't be called that way -- results are undefined if
derivatives (or textures that take implicit derivatives) aren't called
in uniform control flow, full stop. That's why I changed the
definition compared to LLVM - this definition of convergent allows all
the optimizations that the LLVM definition does, but it opens up
additional optimization opportunities since we can assume that control
flow is always uniform when doing divergence analysis. Also, as-is,
the definition matches the GLSL/SPIR-V semantics closely, and since
the purpose of the convergent attribute is to model derivatives in
GLSL and SPIR-V, I'd like to keep that. If GLSL or SPIR-V change their
semantics to allow what you describe, then we can add something
something closer to the LLVM convergent semantics. If you want me to
change the name to avoid confusion with LLVM, that's fair though --
suggestions welcome on what to call it ;)


Okay, I'm convinced that it makes sense to have these semantics, but a 
different name would be good.


How about NIR_INTRINSIC_UNIFORM_CONTROL?

Cheers,
Nicolai







LLVM currently has no equivalent to cross_thread, and we hack around it as
I'm sure you're well aware. The nightmare is trying to find a sound
definition of "cross_thread" that works in LLVM's execution model.


Yeah... this stuff is really tricky to reason about. I think that
eventually, we're going to have to add the notions of control flow
divergence and re-convergence to LLVM's execution model, even though
there's been pushback from some LLVM developers about it. I just don't
see any way we'll be able to do stuff like LICM, aggressive CSE, etc.
effectively in the presence of this cross-thread operations, when
whether you can do those things at all depends on whether branch
conditions are uniform.



Cheers,
Nicolai




   } nir_intrinsic_semantic_flag;
 /**
@@ -1459,6 +1478,67 @@ NIR_DEFINE_CAST(nir_instr_as_parallel_copy,
nir_instr,
   type, nir_instr_type_parallel_copy)
 /*
+ * Helpers to determine if an instruction is cross-thread or convergent.
See
+ * NIR_INTRINSIC_{CONVERGENT|CROSS_THREAD} for the definitions.
+ 

Re: [Mesa-dev] [PATCH 01/17] nir: introduce new convergent and cross-thread attributes

2017-06-12 Thread Connor Abbott
On Mon, Jun 12, 2017 at 2:17 AM, Nicolai Hähnle  wrote:
> On 10.06.2017 01:44, Connor Abbott wrote:
>>
>> From: Connor Abbott 
>>
>> These are properties of the instruction that must be respected when
>> moving it around, in addition to the usual SSA dominance guarantee.
>> Previously, we only had special handling for fddx and fddy, in a very
>> ad-hoc way. But with arb_shader_ballot and arb_shader_group_vote, we'll
>> have to start handling a lot more instructions with similar constraints,
>> so we want to add a more formal model of what the optimizer can and
>> cannot do.
>>
>> v2: don't add attribute for ALU instructions
>> v3: special-case derivative ALU instructions
>> Signed-off-by: Connor Abbott 
>> ---
>>   src/compiler/nir/nir.h | 80
>> ++
>>   1 file changed, 80 insertions(+)
>>
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index 3b827bf..64caccb 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -985,6 +985,25 @@ typedef enum {
>>   * intrinsic are due to the register reads/writes.
>>   */
>>  NIR_INTRINSIC_CAN_REORDER = (1 << 1),
>> +
>> +   /**
>> +* Indicates whether this intrinsic is "cross-thread". An operation is
>> +* cross-thread if results in one thread depend on inputs in another
>> thread,
>> +* and therefore optimizations cannot change the execution mask when
>> the
>> +* operation is called. Examples of cross-thread operations include
>> +* screen-space derivatives, the "any" reduction which returns "true"
>> in
>> +* all threads if any thread inputs "true", etc.
>> +*/
>> +   NIR_INTRINSIC_CROSS_THREAD,
>> +
>> +   /**
>> +* Indicates that this intrinsic is "convergent". An operation is
>> +* convergent when it must always be called in convergent control
>> flow,
>> +* that is, control flow with the same execution mask as when the
>> program
>> +* started. If an operation is convergent, it must be cross-thread as
>> well,
>> +* since the optimizer must maintain the guarantee.
>> +*/
>> +   NIR_INTRINSIC_CONVERGENT,
>
>
> This is inconsistent with LLVM's definition of 'convergent', and I'd like
> you to change it to match up with LLVM.
>
> LLVM's definition of convergent is: "The operation must not be made
> control-dependent on additional values."
>
> In the language of execution masks, this means that optimizations must
> guarantee that the execution mask for the instruction can only become a
> superset of what it was originally. This means lifting is actually okay.
>
> This is relevant because e.g. texture instructions with implicit derivatives
> are actually convergent operations (in the LLVM sense), but obviously they
> can be called with exec masks that are subsets of the exec mask at program
> start.

Actually, according to GLSL (and I think SPIR-V, although I'm not 100%
sure), they can't be called that way -- results are undefined if
derivatives (or textures that take implicit derivatives) aren't called
in uniform control flow, full stop. That's why I changed the
definition compared to LLVM - this definition of convergent allows all
the optimizations that the LLVM definition does, but it opens up
additional optimization opportunities since we can assume that control
flow is always uniform when doing divergence analysis. Also, as-is,
the definition matches the GLSL/SPIR-V semantics closely, and since
the purpose of the convergent attribute is to model derivatives in
GLSL and SPIR-V, I'd like to keep that. If GLSL or SPIR-V change their
semantics to allow what you describe, then we can add something
something closer to the LLVM convergent semantics. If you want me to
change the name to avoid confusion with LLVM, that's fair though --
suggestions welcome on what to call it ;)

>
> LLVM currently has no equivalent to cross_thread, and we hack around it as
> I'm sure you're well aware. The nightmare is trying to find a sound
> definition of "cross_thread" that works in LLVM's execution model.

Yeah... this stuff is really tricky to reason about. I think that
eventually, we're going to have to add the notions of control flow
divergence and re-convergence to LLVM's execution model, even though
there's been pushback from some LLVM developers about it. I just don't
see any way we'll be able to do stuff like LICM, aggressive CSE, etc.
effectively in the presence of this cross-thread operations, when
whether you can do those things at all depends on whether branch
conditions are uniform.

>
> Cheers,
> Nicolai
>
>
>
>>   } nir_intrinsic_semantic_flag;
>> /**
>> @@ -1459,6 +1478,67 @@ NIR_DEFINE_CAST(nir_instr_as_parallel_copy,
>> nir_instr,
>>   type, nir_instr_type_parallel_copy)
>> /*
>> + * Helpers to determine if an instruction is cross-thread or convergent.
>> See
>> + * NIR_INTRINSIC_{CONVERGENT|CROSS_THREAD} for the definitions.
>> + */

Re: [Mesa-dev] [PATCH 01/17] nir: introduce new convergent and cross-thread attributes

2017-06-12 Thread Nicolai Hähnle

On 10.06.2017 01:44, Connor Abbott wrote:

From: Connor Abbott 

These are properties of the instruction that must be respected when
moving it around, in addition to the usual SSA dominance guarantee.
Previously, we only had special handling for fddx and fddy, in a very
ad-hoc way. But with arb_shader_ballot and arb_shader_group_vote, we'll
have to start handling a lot more instructions with similar constraints,
so we want to add a more formal model of what the optimizer can and
cannot do.

v2: don't add attribute for ALU instructions
v3: special-case derivative ALU instructions
Signed-off-by: Connor Abbott 
---
  src/compiler/nir/nir.h | 80 ++
  1 file changed, 80 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 3b827bf..64caccb 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -985,6 +985,25 @@ typedef enum {
  * intrinsic are due to the register reads/writes.
  */
 NIR_INTRINSIC_CAN_REORDER = (1 << 1),
+
+   /**
+* Indicates whether this intrinsic is "cross-thread". An operation is
+* cross-thread if results in one thread depend on inputs in another thread,
+* and therefore optimizations cannot change the execution mask when the
+* operation is called. Examples of cross-thread operations include
+* screen-space derivatives, the "any" reduction which returns "true" in
+* all threads if any thread inputs "true", etc.
+*/
+   NIR_INTRINSIC_CROSS_THREAD,
+
+   /**
+* Indicates that this intrinsic is "convergent". An operation is
+* convergent when it must always be called in convergent control flow,
+* that is, control flow with the same execution mask as when the program
+* started. If an operation is convergent, it must be cross-thread as well,
+* since the optimizer must maintain the guarantee.
+*/
+   NIR_INTRINSIC_CONVERGENT,


This is inconsistent with LLVM's definition of 'convergent', and I'd 
like you to change it to match up with LLVM.


LLVM's definition of convergent is: "The operation must not be made 
control-dependent on additional values."


In the language of execution masks, this means that optimizations must 
guarantee that the execution mask for the instruction can only become a 
superset of what it was originally. This means lifting is actually okay.


This is relevant because e.g. texture instructions with implicit 
derivatives are actually convergent operations (in the LLVM sense), but 
obviously they can be called with exec masks that are subsets of the 
exec mask at program start.


LLVM currently has no equivalent to cross_thread, and we hack around it 
as I'm sure you're well aware. The nightmare is trying to find a sound 
definition of "cross_thread" that works in LLVM's execution model.


Cheers,
Nicolai



  } nir_intrinsic_semantic_flag;
  
  /**

@@ -1459,6 +1478,67 @@ NIR_DEFINE_CAST(nir_instr_as_parallel_copy, nir_instr,
  type, nir_instr_type_parallel_copy)
  
  /*

+ * Helpers to determine if an instruction is cross-thread or convergent. See
+ * NIR_INTRINSIC_{CONVERGENT|CROSS_THREAD} for the definitions.
+ */
+static inline bool
+nir_instr_is_convergent(const nir_instr *instr)
+{
+   switch (instr->type) {
+   case nir_instr_type_alu:
+  switch (nir_instr_as_alu(instr)->op) {
+  case nir_op_fddx:
+  case nir_op_fddy:
+  case nir_op_fddx_fine:
+  case nir_op_fddy_fine:
+  case nir_op_fddx_coarse:
+  case nir_op_fddy_coarse:
+ /* Partial derivatives are convergent */
+ return true;
+
+  default:
+ return false;
+  }
+
+   case nir_instr_type_intrinsic: {
+  nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
+  return nir_intrinsic_infos[intrin->intrinsic].flags &
+ NIR_INTRINSIC_CONVERGENT;
+   }
+
+   case nir_instr_type_tex:
+ switch (nir_instr_as_tex(instr)->op) {
+ case nir_texop_tex:
+ case nir_texop_txb:
+ case nir_texop_lod:
+/* These three take implicit derivatives, so they are convergent */
+return true;
+
+ default:
+return false;
+ }
+
+   default:
+  return false;
+   }
+}
+
+static inline bool
+nir_instr_is_cross_thread(const nir_instr *instr)
+{
+   switch (instr->type) {
+   case nir_instr_type_intrinsic: {
+  nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
+  return nir_intrinsic_infos[intrin->intrinsic].flags &
+ NIR_INTRINSIC_CROSS_THREAD;
+   }
+
+   default:
+  return nir_instr_is_convergent(instr);
+   }
+}
+
+/*
   * Control flow
   *
   * Control flow consists of a tree of control flow nodes, which include




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH 01/17] nir: introduce new convergent and cross-thread attributes

2017-06-09 Thread Connor Abbott
From: Connor Abbott 

These are properties of the instruction that must be respected when
moving it around, in addition to the usual SSA dominance guarantee.
Previously, we only had special handling for fddx and fddy, in a very
ad-hoc way. But with arb_shader_ballot and arb_shader_group_vote, we'll
have to start handling a lot more instructions with similar constraints,
so we want to add a more formal model of what the optimizer can and
cannot do.

v2: don't add attribute for ALU instructions
v3: special-case derivative ALU instructions
Signed-off-by: Connor Abbott 
---
 src/compiler/nir/nir.h | 80 ++
 1 file changed, 80 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 3b827bf..64caccb 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -985,6 +985,25 @@ typedef enum {
 * intrinsic are due to the register reads/writes.
 */
NIR_INTRINSIC_CAN_REORDER = (1 << 1),
+
+   /**
+* Indicates whether this intrinsic is "cross-thread". An operation is
+* cross-thread if results in one thread depend on inputs in another thread,
+* and therefore optimizations cannot change the execution mask when the
+* operation is called. Examples of cross-thread operations include
+* screen-space derivatives, the "any" reduction which returns "true" in
+* all threads if any thread inputs "true", etc.
+*/
+   NIR_INTRINSIC_CROSS_THREAD,
+
+   /**
+* Indicates that this intrinsic is "convergent". An operation is
+* convergent when it must always be called in convergent control flow,
+* that is, control flow with the same execution mask as when the program
+* started. If an operation is convergent, it must be cross-thread as well,
+* since the optimizer must maintain the guarantee.
+*/
+   NIR_INTRINSIC_CONVERGENT,
 } nir_intrinsic_semantic_flag;
 
 /**
@@ -1459,6 +1478,67 @@ NIR_DEFINE_CAST(nir_instr_as_parallel_copy, nir_instr,
 type, nir_instr_type_parallel_copy)
 
 /*
+ * Helpers to determine if an instruction is cross-thread or convergent. See
+ * NIR_INTRINSIC_{CONVERGENT|CROSS_THREAD} for the definitions.
+ */
+static inline bool
+nir_instr_is_convergent(const nir_instr *instr)
+{
+   switch (instr->type) {
+   case nir_instr_type_alu:
+  switch (nir_instr_as_alu(instr)->op) {
+  case nir_op_fddx:
+  case nir_op_fddy:
+  case nir_op_fddx_fine:
+  case nir_op_fddy_fine:
+  case nir_op_fddx_coarse:
+  case nir_op_fddy_coarse:
+ /* Partial derivatives are convergent */
+ return true;
+
+  default:
+ return false;
+  }
+
+   case nir_instr_type_intrinsic: {
+  nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
+  return nir_intrinsic_infos[intrin->intrinsic].flags &
+ NIR_INTRINSIC_CONVERGENT;
+   }
+
+   case nir_instr_type_tex:
+ switch (nir_instr_as_tex(instr)->op) {
+ case nir_texop_tex:
+ case nir_texop_txb:
+ case nir_texop_lod:
+/* These three take implicit derivatives, so they are convergent */
+return true;
+
+ default:
+return false;
+ }
+
+   default:
+  return false;
+   }
+}
+
+static inline bool
+nir_instr_is_cross_thread(const nir_instr *instr)
+{
+   switch (instr->type) {
+   case nir_instr_type_intrinsic: {
+  nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
+  return nir_intrinsic_infos[intrin->intrinsic].flags &
+ NIR_INTRINSIC_CROSS_THREAD;
+   }
+
+   default:
+  return nir_instr_is_convergent(instr);
+   }
+}
+
+/*
  * Control flow
  *
  * Control flow consists of a tree of control flow nodes, which include
-- 
2.9.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev