Re: [Nouveau] [PATCH v2] nv110/exa: update sched codes

2017-06-09 Thread Samuel Pitoiset



On 06/08/2017 05:19 PM, Aaryaman Vasishta wrote:



On Thu, Jun 8, 2017 at 5:01 AM, Samuel Pitoiset 
> wrote:




On 06/07/2017 06:58 PM, Aaryaman Vasishta wrote:



On Tue, Jun 6, 2017 at 7:15 AM, Samuel Pitoiset

>> wrote:

 Nice work!

 See my comments below, and double-check if some of them can be
 applied to the shaders I didn't review yet.

 I recommend you to test your work because if one sched code is
 wrong, you are likely going to kill your card and reboot
your box. :-)


 On 06/03/2017 04:16 PM, Aaryaman Vasishta wrote:

 v2: Add missing delays

 This patch adds proper delays to maxwell exa shaders.
 rendercheck tests
 seem consistent with/without this patch. I haven't
extensively
 tested
 them though.

 Trello:

https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays



>

 Signed-off-by: Aaryaman Vasishta

 >>

 ---
src/shader/exac8nv110.fp  | 10 +-
src/shader/exac8nv110.fpc | 18 +-
src/shader/exacanv110.fp  | 10 +-
src/shader/exacanv110.fpc | 18 +-
src/shader/exacmnv110.fp  | 10 +-
src/shader/exacmnv110.fpc | 18 +-
src/shader/exas8nv110.fp  |  6 +++---
src/shader/exas8nv110.fpc | 12 ++--
src/shader/exasanv110.fp  | 10 +-
src/shader/exasanv110.fpc | 18 +-
src/shader/exascnv110.fp  |  6 +++---
src/shader/exascnv110.fpc | 10 +-
src/shader/videonv110.fp  | 14 +++---
src/shader/videonv110.fpc | 26
+-
14 files changed, 93 insertions(+), 93 deletions(-)

 diff --git a/src/shader/exac8nv110.fp
b/src/shader/exac8nv110.fp
 index ce78036..1c4a4f1 100644
 --- a/src/shader/exac8nv110.fp
 +++ b/src/shader/exac8nv110.fp
 @@ -25,23 +25,23 @@ NV110FP_Composite_A8[] = {
};
#else
-sched (st 0x0) (st 0x0) (st 0x0)
 +sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf
wr 0x0 wt 0x1)
ipa pass $r0 a[0x7c] 0x0 0x0 0x1
mufu rcp $r0 $r0
ipa $r3 a[0x94] $r0 0x0 0x1
 -sched (st 0x0) (st 0x0) (st 0x0)
 +sched (st 0xf wr 0x1) (st 0xf wr 0x0 rd 0x1 wt 0x3)
(st 0xf wr
 0x1 wt 0x2)
ipa $r2 a[0x90] $r0 0x0 0x1
tex nodep $r1 $r2 0x0 0x1 t2d 0x8
ipa $r3 a[0x84] $r0 0x0 0x1
 -sched (st 0x0) (st 0x0) (st 0x0)
 +sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf)
ipa $r2 a[0x80] $r0 0x0 0x1
tex nodep $r0 $r2 0x0 0x0 t2d 0x8


 Out of curiosity, what didn't you add a read-dep-bar on
$r2:$r3 here?

Missed it, thanks for pointing it out.


You don't have to. 'tex' reads two sources ($r2:$r3) and writes into
$r0, but as $r2:$r3 are NOT re-used before $r0 is read, you can
assume that $r0 will be ready and don't need any read-dep-bar.

Ah, so r2:r3, which are written on by the two 'ipa' above it, have 
already been waited on in this tex, and both of them read $r0 so we can 
safely assume that since the two 'ipa' instructions are already waited 
on, $r0 will be ready?


No.

It's because the next 'fmul' waits for $r0 (output of 'tex'). So, if $r0 
is "ready", you can assume that $r2:$r3 can be re-used. It's a 
particular situation which doesn't need to emit any read-dep-bars, you 
can add them if you want but that's useless.









depbar le 0x5 0x0 0x0
 -sched (st 0x0) (st 0x0) (st 0x0)
 +sched (st 0x6 wt 0x3) (st 0x6) (st 0x1)
fmul ftz $r3 $r0 $r1
 

Re: [Nouveau] [PATCH v2] nv110/exa: update sched codes

2017-06-08 Thread Aaryaman Vasishta
On Thu, Jun 8, 2017 at 5:01 AM, Samuel Pitoiset 
wrote:

>
>
> On 06/07/2017 06:58 PM, Aaryaman Vasishta wrote:
>
>>
>>
>> On Tue, Jun 6, 2017 at 7:15 AM, Samuel Pitoiset <
>> samuel.pitoi...@gmail.com > wrote:
>>
>> Nice work!
>>
>> See my comments below, and double-check if some of them can be
>> applied to the shaders I didn't review yet.
>>
>> I recommend you to test your work because if one sched code is
>> wrong, you are likely going to kill your card and reboot your box. :-)
>>
>>
>> On 06/03/2017 04:16 PM, Aaryaman Vasishta wrote:
>>
>> v2: Add missing delays
>>
>> This patch adds proper delays to maxwell exa shaders.
>> rendercheck tests
>> seem consistent with/without this patch. I haven't extensively
>> tested
>> them though.
>>
>> Trello:
>> https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-wit
>> h-proper-delays
>> > th-proper-delays>
>>
>> Signed-off-by: Aaryaman Vasishta > >
>>
>> ---
>>src/shader/exac8nv110.fp  | 10 +-
>>src/shader/exac8nv110.fpc | 18 +-
>>src/shader/exacanv110.fp  | 10 +-
>>src/shader/exacanv110.fpc | 18 +-
>>src/shader/exacmnv110.fp  | 10 +-
>>src/shader/exacmnv110.fpc | 18 +-
>>src/shader/exas8nv110.fp  |  6 +++---
>>src/shader/exas8nv110.fpc | 12 ++--
>>src/shader/exasanv110.fp  | 10 +-
>>src/shader/exasanv110.fpc | 18 +-
>>src/shader/exascnv110.fp  |  6 +++---
>>src/shader/exascnv110.fpc | 10 +-
>>src/shader/videonv110.fp  | 14 +++---
>>src/shader/videonv110.fpc | 26 +-
>>14 files changed, 93 insertions(+), 93 deletions(-)
>>
>> diff --git a/src/shader/exac8nv110.fp b/src/shader/exac8nv110.fp
>> index ce78036..1c4a4f1 100644
>> --- a/src/shader/exac8nv110.fp
>> +++ b/src/shader/exac8nv110.fp
>> @@ -25,23 +25,23 @@ NV110FP_Composite_A8[] = {
>>};
>>#else
>>-sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt
>> 0x1)
>>ipa pass $r0 a[0x7c] 0x0 0x0 0x1
>>mufu rcp $r0 $r0
>>ipa $r3 a[0x94] $r0 0x0 0x1
>> -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0xf wr 0x1) (st 0xf wr 0x0 rd 0x1 wt 0x3) (st 0xf wr
>> 0x1 wt 0x2)
>>ipa $r2 a[0x90] $r0 0x0 0x1
>>tex nodep $r1 $r2 0x0 0x1 t2d 0x8
>>ipa $r3 a[0x84] $r0 0x0 0x1
>> -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf)
>>ipa $r2 a[0x80] $r0 0x0 0x1
>>tex nodep $r0 $r2 0x0 0x0 t2d 0x8
>>
>>
>> Out of curiosity, what didn't you add a read-dep-bar on $r2:$r3 here?
>>
>> Missed it, thanks for pointing it out.
>>
>
> You don't have to. 'tex' reads two sources ($r2:$r3) and writes into $r0,
> but as $r2:$r3 are NOT re-used before $r0 is read, you can assume that $r0
> will be ready and don't need any read-dep-bar.

Ah, so r2:r3, which are written on by the two 'ipa' above it, have already
been waited on in this tex, and both of them read $r0 so we can safely
assume that since the two 'ipa' instructions are already waited on, $r0
will be ready?

>




>
>
>>
>>
>>depbar le 0x5 0x0 0x0
>> -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0x6 wt 0x3) (st 0x6) (st 0x1)
>>fmul ftz $r3 $r0 $r1
>>mov $r2 $r3 0xf
>>
>>
>> You can stall for only one cycle here, but the 6 cycles on fmul is
>> needed.
>>
>>mov $r1 $r3 0xf
>> -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0x6) (st 0xf) (st 0x0)
>>mov $r0 $r3 0xf
>>
>>
>> Same here.
>>
>>
>>exit
>>#endif
>> diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc
>> index 4aa1368..46943b7 100644
>> --- a/src/shader/exac8nv110.fpc
>> +++ b/src/shader/exac8nv110.fpc
>> @@ -1,36 +1,36 @@
>> -0xfc0007e0,
>> -0x001f8000,
>> +0xe1a0070f,
>> +0x003c3c01,
>>0xcff7ff00,
>>0xe003ff87,
>>0x0047,
>>0x5080,
>>0x4007ff03,
>>0xe043ff89,
>> -0xfc0007e0,
>> -0x001f8000,
>> +0x21e0072f,
>> +0x005cbc03,
>>0x0007ff02,
>>0xe043ff89,
>>0x2ff70201,
>>0xc03a0014,
>>0x4007ff03,
>>0xe043ff88,
>> 

Re: [Nouveau] [PATCH v2] nv110/exa: update sched codes

2017-06-07 Thread Samuel Pitoiset



On 06/07/2017 06:58 PM, Aaryaman Vasishta wrote:



On Tue, Jun 6, 2017 at 7:15 AM, Samuel Pitoiset 
> wrote:


Nice work!

See my comments below, and double-check if some of them can be
applied to the shaders I didn't review yet.

I recommend you to test your work because if one sched code is
wrong, you are likely going to kill your card and reboot your box. :-)


On 06/03/2017 04:16 PM, Aaryaman Vasishta wrote:

v2: Add missing delays

This patch adds proper delays to maxwell exa shaders.
rendercheck tests
seem consistent with/without this patch. I haven't extensively
tested
them though.

Trello:

https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays



Signed-off-by: Aaryaman Vasishta >
---
   src/shader/exac8nv110.fp  | 10 +-
   src/shader/exac8nv110.fpc | 18 +-
   src/shader/exacanv110.fp  | 10 +-
   src/shader/exacanv110.fpc | 18 +-
   src/shader/exacmnv110.fp  | 10 +-
   src/shader/exacmnv110.fpc | 18 +-
   src/shader/exas8nv110.fp  |  6 +++---
   src/shader/exas8nv110.fpc | 12 ++--
   src/shader/exasanv110.fp  | 10 +-
   src/shader/exasanv110.fpc | 18 +-
   src/shader/exascnv110.fp  |  6 +++---
   src/shader/exascnv110.fpc | 10 +-
   src/shader/videonv110.fp  | 14 +++---
   src/shader/videonv110.fpc | 26 +-
   14 files changed, 93 insertions(+), 93 deletions(-)

diff --git a/src/shader/exac8nv110.fp b/src/shader/exac8nv110.fp
index ce78036..1c4a4f1 100644
--- a/src/shader/exac8nv110.fp
+++ b/src/shader/exac8nv110.fp
@@ -25,23 +25,23 @@ NV110FP_Composite_A8[] = {
   };
   #else
   -sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
   ipa pass $r0 a[0x7c] 0x0 0x0 0x1
   mufu rcp $r0 $r0
   ipa $r3 a[0x94] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x1) (st 0xf wr 0x0 rd 0x1 wt 0x3) (st 0xf wr
0x1 wt 0x2)
   ipa $r2 a[0x90] $r0 0x0 0x1
   tex nodep $r1 $r2 0x0 0x1 t2d 0x8
   ipa $r3 a[0x84] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf)
   ipa $r2 a[0x80] $r0 0x0 0x1
   tex nodep $r0 $r2 0x0 0x0 t2d 0x8


Out of curiosity, what didn't you add a read-dep-bar on $r2:$r3 here?

Missed it, thanks for pointing it out.


You don't have to. 'tex' reads two sources ($r2:$r3) and writes into 
$r0, but as $r2:$r3 are NOT re-used before $r0 is read, you can assume 
that $r0 will be ready and don't need any read-dep-bar.






   depbar le 0x5 0x0 0x0
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x6 wt 0x3) (st 0x6) (st 0x1)
   fmul ftz $r3 $r0 $r1
   mov $r2 $r3 0xf


You can stall for only one cycle here, but the 6 cycles on fmul is
needed.

   mov $r1 $r3 0xf
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x6) (st 0xf) (st 0x0)
   mov $r0 $r3 0xf


Same here. 




   exit
   #endif
diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc
index 4aa1368..46943b7 100644
--- a/src/shader/exac8nv110.fpc
+++ b/src/shader/exac8nv110.fpc
@@ -1,36 +1,36 @@
-0xfc0007e0,
-0x001f8000,
+0xe1a0070f,
+0x003c3c01,
   0xcff7ff00,
   0xe003ff87,
   0x0047,
   0x5080,
   0x4007ff03,
   0xe043ff89,
-0xfc0007e0,
-0x001f8000,
+0x21e0072f,
+0x005cbc03,
   0x0007ff02,
   0xe043ff89,
   0x2ff70201,
   0xc03a0014,
   0x4007ff03,
   0xe043ff88,
-0xfc0007e0,
-0x001f8000,
+0xe5e0074f,
+0x001fbc06,
   0x0007ff02,
   0xe043ff88,
   0x2ff70200,
   0xc03a0004,
   0x3407,
   0xf0f0,
-0xfc0007e0,
-0x001f8000,
+0xfcc01fe6,
+0x001f8400,
   0x00170003,
   0x5c681000,
   0x00370002,
   0x5c980780,
   0x00370001,
   0x5c980780,
-0xfc0007e0,
+0xfde007e6,
   0x001f8000,
   0x0037,
   0x5c980780,
diff --git a/src/shader/exacanv110.fp b/src/shader/exacanv110.fp
index a70d5c5..d7c2867 100644
--- 

Re: [Nouveau] [PATCH v2] nv110/exa: update sched codes

2017-06-07 Thread Aaryaman Vasishta
On Tue, Jun 6, 2017 at 7:15 AM, Samuel Pitoiset 
wrote:

> Nice work!
>
> See my comments below, and double-check if some of them can be applied to
> the shaders I didn't review yet.
>
> I recommend you to test your work because if one sched code is wrong, you
> are likely going to kill your card and reboot your box. :-)
>
>
> On 06/03/2017 04:16 PM, Aaryaman Vasishta wrote:
>
>> v2: Add missing delays
>>
>> This patch adds proper delays to maxwell exa shaders. rendercheck tests
>> seem consistent with/without this patch. I haven't extensively tested
>> them though.
>>
>> Trello:
>> https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-wit
>> h-proper-delays
>>
>> Signed-off-by: Aaryaman Vasishta 
>> ---
>>   src/shader/exac8nv110.fp  | 10 +-
>>   src/shader/exac8nv110.fpc | 18 +-
>>   src/shader/exacanv110.fp  | 10 +-
>>   src/shader/exacanv110.fpc | 18 +-
>>   src/shader/exacmnv110.fp  | 10 +-
>>   src/shader/exacmnv110.fpc | 18 +-
>>   src/shader/exas8nv110.fp  |  6 +++---
>>   src/shader/exas8nv110.fpc | 12 ++--
>>   src/shader/exasanv110.fp  | 10 +-
>>   src/shader/exasanv110.fpc | 18 +-
>>   src/shader/exascnv110.fp  |  6 +++---
>>   src/shader/exascnv110.fpc | 10 +-
>>   src/shader/videonv110.fp  | 14 +++---
>>   src/shader/videonv110.fpc | 26 +-
>>   14 files changed, 93 insertions(+), 93 deletions(-)
>>
>> diff --git a/src/shader/exac8nv110.fp b/src/shader/exac8nv110.fp
>> index ce78036..1c4a4f1 100644
>> --- a/src/shader/exac8nv110.fp
>> +++ b/src/shader/exac8nv110.fp
>> @@ -25,23 +25,23 @@ NV110FP_Composite_A8[] = {
>>   };
>>   #else
>>   -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
>>   ipa pass $r0 a[0x7c] 0x0 0x0 0x1
>>   mufu rcp $r0 $r0
>>   ipa $r3 a[0x94] $r0 0x0 0x1
>> -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0xf wr 0x1) (st 0xf wr 0x0 rd 0x1 wt 0x3) (st 0xf wr 0x1 wt
>> 0x2)
>>   ipa $r2 a[0x90] $r0 0x0 0x1
>>   tex nodep $r1 $r2 0x0 0x1 t2d 0x8
>>   ipa $r3 a[0x84] $r0 0x0 0x1
>> -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf)
>>   ipa $r2 a[0x80] $r0 0x0 0x1
>>   tex nodep $r0 $r2 0x0 0x0 t2d 0x8
>>
>
> Out of curiosity, what didn't you add a read-dep-bar on $r2:$r3 here?

Missed it, thanks for pointing it out.

>
>
>   depbar le 0x5 0x0 0x0
>> -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0x6 wt 0x3) (st 0x6) (st 0x1)
>>   fmul ftz $r3 $r0 $r1
>>   mov $r2 $r3 0xf
>>
>
> You can stall for only one cycle here, but the 6 cycles on fmul is needed.
>
>   mov $r1 $r3 0xf
>> -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0x6) (st 0xf) (st 0x0)
>>   mov $r0 $r3 0xf
>>
>
> Same here.


>
>   exit
>>   #endif
>> diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc
>> index 4aa1368..46943b7 100644
>> --- a/src/shader/exac8nv110.fpc
>> +++ b/src/shader/exac8nv110.fpc
>> @@ -1,36 +1,36 @@
>> -0xfc0007e0,
>> -0x001f8000,
>> +0xe1a0070f,
>> +0x003c3c01,
>>   0xcff7ff00,
>>   0xe003ff87,
>>   0x0047,
>>   0x5080,
>>   0x4007ff03,
>>   0xe043ff89,
>> -0xfc0007e0,
>> -0x001f8000,
>> +0x21e0072f,
>> +0x005cbc03,
>>   0x0007ff02,
>>   0xe043ff89,
>>   0x2ff70201,
>>   0xc03a0014,
>>   0x4007ff03,
>>   0xe043ff88,
>> -0xfc0007e0,
>> -0x001f8000,
>> +0xe5e0074f,
>> +0x001fbc06,
>>   0x0007ff02,
>>   0xe043ff88,
>>   0x2ff70200,
>>   0xc03a0004,
>>   0x3407,
>>   0xf0f0,
>> -0xfc0007e0,
>> -0x001f8000,
>> +0xfcc01fe6,
>> +0x001f8400,
>>   0x00170003,
>>   0x5c681000,
>>   0x00370002,
>>   0x5c980780,
>>   0x00370001,
>>   0x5c980780,
>> -0xfc0007e0,
>> +0xfde007e6,
>>   0x001f8000,
>>   0x0037,
>>   0x5c980780,
>> diff --git a/src/shader/exacanv110.fp b/src/shader/exacanv110.fp
>> index a70d5c5..d7c2867 100644
>> --- a/src/shader/exacanv110.fp
>> +++ b/src/shader/exacanv110.fp
>> @@ -25,23 +25,23 @@ NV110FP_CAComposite[] = {
>>   };
>>   #else
>>   -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
>>   ipa pass $r0 a[0x7c] 0x0 0x0 0x1
>>   mufu rcp $r0 $r0
>>   ipa $r3 a[0x94] $r0 0x0 0x1
>> -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0xf wr 0x1) (st 0xf wr 0x0 wt 0x3) (st 0xf wr 0x1 rd 0x2)
>>   ipa $r2 a[0x90] $r0 0x0 0x1
>>   tex nodep $r4 $r2 0x0 0x1 t2d 0xf
>>
>
> Please add a read-dep-bar and wait for on the first fmul because $r2:$r3
> are re-used before $r4. Should be safer.


>
>   ipa $r1 a[0x84] $r0 0x0 0x1
>> -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0xf wr 0x2 wt 0x4) (st 0xf wr 0x1 wt 0x6) (st 0xf)
>>   ipa $r0 a[0x80] $r0 0x0 0x1
>>   tex nodep $r0 $r0 0x0 0x0 t2d 0xf
>>   depbar le 0x5 0x0 0x0
>> -sched (st 0x0) (st 0x0) (st 0x0)
>> +sched (st 0x1 wt 0x3f) (st 0x1) (st 0x1)
>>   fmul ftz $r3 $r3 $r7
>>
>
> Why are you waiting all barriers? Only $r3 is needed here.

After adding a 

Re: [Nouveau] [PATCH v2] nv110/exa: update sched codes

2017-06-05 Thread Samuel Pitoiset

Nice work!

See my comments below, and double-check if some of them can be applied 
to the shaders I didn't review yet.


I recommend you to test your work because if one sched code is wrong, 
you are likely going to kill your card and reboot your box. :-)


On 06/03/2017 04:16 PM, Aaryaman Vasishta wrote:

v2: Add missing delays

This patch adds proper delays to maxwell exa shaders. rendercheck tests
seem consistent with/without this patch. I haven't extensively tested
them though.

Trello:
https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays

Signed-off-by: Aaryaman Vasishta 
---
  src/shader/exac8nv110.fp  | 10 +-
  src/shader/exac8nv110.fpc | 18 +-
  src/shader/exacanv110.fp  | 10 +-
  src/shader/exacanv110.fpc | 18 +-
  src/shader/exacmnv110.fp  | 10 +-
  src/shader/exacmnv110.fpc | 18 +-
  src/shader/exas8nv110.fp  |  6 +++---
  src/shader/exas8nv110.fpc | 12 ++--
  src/shader/exasanv110.fp  | 10 +-
  src/shader/exasanv110.fpc | 18 +-
  src/shader/exascnv110.fp  |  6 +++---
  src/shader/exascnv110.fpc | 10 +-
  src/shader/videonv110.fp  | 14 +++---
  src/shader/videonv110.fpc | 26 +-
  14 files changed, 93 insertions(+), 93 deletions(-)

diff --git a/src/shader/exac8nv110.fp b/src/shader/exac8nv110.fp
index ce78036..1c4a4f1 100644
--- a/src/shader/exac8nv110.fp
+++ b/src/shader/exac8nv110.fp
@@ -25,23 +25,23 @@ NV110FP_Composite_A8[] = {
  };
  #else
  
-sched (st 0x0) (st 0x0) (st 0x0)

+sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
  ipa pass $r0 a[0x7c] 0x0 0x0 0x1
  mufu rcp $r0 $r0
  ipa $r3 a[0x94] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x1) (st 0xf wr 0x0 rd 0x1 wt 0x3) (st 0xf wr 0x1 wt 0x2)
  ipa $r2 a[0x90] $r0 0x0 0x1
  tex nodep $r1 $r2 0x0 0x1 t2d 0x8
  ipa $r3 a[0x84] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf)
  ipa $r2 a[0x80] $r0 0x0 0x1
  tex nodep $r0 $r2 0x0 0x0 t2d 0x8


Out of curiosity, what didn't you add a read-dep-bar on $r2:$r3 here?


  depbar le 0x5 0x0 0x0
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x6 wt 0x3) (st 0x6) (st 0x1)
  fmul ftz $r3 $r0 $r1
  mov $r2 $r3 0xf


You can stall for only one cycle here, but the 6 cycles on fmul is needed.


  mov $r1 $r3 0xf
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x6) (st 0xf) (st 0x0)
  mov $r0 $r3 0xf


Same here.


  exit
  #endif
diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc
index 4aa1368..46943b7 100644
--- a/src/shader/exac8nv110.fpc
+++ b/src/shader/exac8nv110.fpc
@@ -1,36 +1,36 @@
-0xfc0007e0,
-0x001f8000,
+0xe1a0070f,
+0x003c3c01,
  0xcff7ff00,
  0xe003ff87,
  0x0047,
  0x5080,
  0x4007ff03,
  0xe043ff89,
-0xfc0007e0,
-0x001f8000,
+0x21e0072f,
+0x005cbc03,
  0x0007ff02,
  0xe043ff89,
  0x2ff70201,
  0xc03a0014,
  0x4007ff03,
  0xe043ff88,
-0xfc0007e0,
-0x001f8000,
+0xe5e0074f,
+0x001fbc06,
  0x0007ff02,
  0xe043ff88,
  0x2ff70200,
  0xc03a0004,
  0x3407,
  0xf0f0,
-0xfc0007e0,
-0x001f8000,
+0xfcc01fe6,
+0x001f8400,
  0x00170003,
  0x5c681000,
  0x00370002,
  0x5c980780,
  0x00370001,
  0x5c980780,
-0xfc0007e0,
+0xfde007e6,
  0x001f8000,
  0x0037,
  0x5c980780,
diff --git a/src/shader/exacanv110.fp b/src/shader/exacanv110.fp
index a70d5c5..d7c2867 100644
--- a/src/shader/exacanv110.fp
+++ b/src/shader/exacanv110.fp
@@ -25,23 +25,23 @@ NV110FP_CAComposite[] = {
  };
  #else
  
-sched (st 0x0) (st 0x0) (st 0x0)

+sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
  ipa pass $r0 a[0x7c] 0x0 0x0 0x1
  mufu rcp $r0 $r0
  ipa $r3 a[0x94] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x1) (st 0xf wr 0x0 wt 0x3) (st 0xf wr 0x1 rd 0x2)
  ipa $r2 a[0x90] $r0 0x0 0x1
  tex nodep $r4 $r2 0x0 0x1 t2d 0xf


Please add a read-dep-bar and wait for on the first fmul because $r2:$r3 
are re-used before $r4. Should be safer.



  ipa $r1 a[0x84] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x2 wt 0x4) (st 0xf wr 0x1 wt 0x6) (st 0xf)
  ipa $r0 a[0x80] $r0 0x0 0x1
  tex nodep $r0 $r0 0x0 0x0 t2d 0xf
  depbar le 0x5 0x0 0x0
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x1 wt 0x3f) (st 0x1) (st 0x1)
  fmul ftz $r3 $r3 $r7


Why are you waiting all barriers? Only $r3 is needed here.


  fmul ftz $r2 $r2 $r6
  fmul ftz $r1 $r1 $r5
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x1 wt 0x3) (st 0xf) (st 0x0)
  fmul ftz $r0 $r0 $r4
  exit
  #endif
diff --git a/src/shader/exacanv110.fpc b/src/shader/exacanv110.fpc
index 7c0ca5e..9cad139 100644
--- a/src/shader/exacanv110.fpc
+++ b/src/shader/exacanv110.fpc
@@ -1,36 +1,36 @@
-0xfc0007e0,
-0x001f8000,
+0xe1a0070f,
+0x003c3c01,
  0xcff7ff00,
  0xe003ff87,
  0x0047,
  0x5080,
  0x4007ff03,
  0xe043ff89,
-0xfc0007e0,
-0x001f8000,
+0xe1e0072f,
+0x0008bc03,
  0x0007ff02,
  0xe043ff89,
  0xaff70204,
  

[Nouveau] [PATCH v2] nv110/exa: update sched codes

2017-06-03 Thread Aaryaman Vasishta
v2: Add missing delays

This patch adds proper delays to maxwell exa shaders. rendercheck tests
seem consistent with/without this patch. I haven't extensively tested
them though.

Trello:
https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays

Signed-off-by: Aaryaman Vasishta 
---
 src/shader/exac8nv110.fp  | 10 +-
 src/shader/exac8nv110.fpc | 18 +-
 src/shader/exacanv110.fp  | 10 +-
 src/shader/exacanv110.fpc | 18 +-
 src/shader/exacmnv110.fp  | 10 +-
 src/shader/exacmnv110.fpc | 18 +-
 src/shader/exas8nv110.fp  |  6 +++---
 src/shader/exas8nv110.fpc | 12 ++--
 src/shader/exasanv110.fp  | 10 +-
 src/shader/exasanv110.fpc | 18 +-
 src/shader/exascnv110.fp  |  6 +++---
 src/shader/exascnv110.fpc | 10 +-
 src/shader/videonv110.fp  | 14 +++---
 src/shader/videonv110.fpc | 26 +-
 14 files changed, 93 insertions(+), 93 deletions(-)

diff --git a/src/shader/exac8nv110.fp b/src/shader/exac8nv110.fp
index ce78036..1c4a4f1 100644
--- a/src/shader/exac8nv110.fp
+++ b/src/shader/exac8nv110.fp
@@ -25,23 +25,23 @@ NV110FP_Composite_A8[] = {
 };
 #else
 
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
 ipa pass $r0 a[0x7c] 0x0 0x0 0x1
 mufu rcp $r0 $r0
 ipa $r3 a[0x94] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x1) (st 0xf wr 0x0 rd 0x1 wt 0x3) (st 0xf wr 0x1 wt 0x2)
 ipa $r2 a[0x90] $r0 0x0 0x1
 tex nodep $r1 $r2 0x0 0x1 t2d 0x8
 ipa $r3 a[0x84] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf)
 ipa $r2 a[0x80] $r0 0x0 0x1
 tex nodep $r0 $r2 0x0 0x0 t2d 0x8
 depbar le 0x5 0x0 0x0
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x6 wt 0x3) (st 0x6) (st 0x1)
 fmul ftz $r3 $r0 $r1
 mov $r2 $r3 0xf
 mov $r1 $r3 0xf
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x6) (st 0xf) (st 0x0)
 mov $r0 $r3 0xf
 exit
 #endif
diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc
index 4aa1368..46943b7 100644
--- a/src/shader/exac8nv110.fpc
+++ b/src/shader/exac8nv110.fpc
@@ -1,36 +1,36 @@
-0xfc0007e0,
-0x001f8000,
+0xe1a0070f,
+0x003c3c01,
 0xcff7ff00,
 0xe003ff87,
 0x0047,
 0x5080,
 0x4007ff03,
 0xe043ff89,
-0xfc0007e0,
-0x001f8000,
+0x21e0072f,
+0x005cbc03,
 0x0007ff02,
 0xe043ff89,
 0x2ff70201,
 0xc03a0014,
 0x4007ff03,
 0xe043ff88,
-0xfc0007e0,
-0x001f8000,
+0xe5e0074f,
+0x001fbc06,
 0x0007ff02,
 0xe043ff88,
 0x2ff70200,
 0xc03a0004,
 0x3407,
 0xf0f0,
-0xfc0007e0,
-0x001f8000,
+0xfcc01fe6,
+0x001f8400,
 0x00170003,
 0x5c681000,
 0x00370002,
 0x5c980780,
 0x00370001,
 0x5c980780,
-0xfc0007e0,
+0xfde007e6,
 0x001f8000,
 0x0037,
 0x5c980780,
diff --git a/src/shader/exacanv110.fp b/src/shader/exacanv110.fp
index a70d5c5..d7c2867 100644
--- a/src/shader/exacanv110.fp
+++ b/src/shader/exacanv110.fp
@@ -25,23 +25,23 @@ NV110FP_CAComposite[] = {
 };
 #else
 
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
 ipa pass $r0 a[0x7c] 0x0 0x0 0x1
 mufu rcp $r0 $r0
 ipa $r3 a[0x94] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x1) (st 0xf wr 0x0 wt 0x3) (st 0xf wr 0x1 rd 0x2)
 ipa $r2 a[0x90] $r0 0x0 0x1
 tex nodep $r4 $r2 0x0 0x1 t2d 0xf
 ipa $r1 a[0x84] $r0 0x0 0x1
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x2 wt 0x4) (st 0xf wr 0x1 wt 0x6) (st 0xf)
 ipa $r0 a[0x80] $r0 0x0 0x1
 tex nodep $r0 $r0 0x0 0x0 t2d 0xf
 depbar le 0x5 0x0 0x0
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x1 wt 0x3f) (st 0x1) (st 0x1)
 fmul ftz $r3 $r3 $r7
 fmul ftz $r2 $r2 $r6
 fmul ftz $r1 $r1 $r5
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0x1 wt 0x3) (st 0xf) (st 0x0)
 fmul ftz $r0 $r0 $r4
 exit
 #endif
diff --git a/src/shader/exacanv110.fpc b/src/shader/exacanv110.fpc
index 7c0ca5e..9cad139 100644
--- a/src/shader/exacanv110.fpc
+++ b/src/shader/exacanv110.fpc
@@ -1,36 +1,36 @@
-0xfc0007e0,
-0x001f8000,
+0xe1a0070f,
+0x003c3c01,
 0xcff7ff00,
 0xe003ff87,
 0x0047,
 0x5080,
 0x4007ff03,
 0xe043ff89,
-0xfc0007e0,
-0x001f8000,
+0xe1e0072f,
+0x0008bc03,
 0x0007ff02,
 0xe043ff89,
 0xaff70204,
 0xc03a0017,
 0x4007ff01,
 0xe043ff88,
-0xfc0007e0,
-0x001f8000,
+0xe5e0274f,
+0x001fbc06,
 0x0007ff00,
 0xe043ff88,
 0xaff7,
 0xc03a0007,
 0x3407,
 0xf0f0,
-0xfc0007e0,
-0x001f8000,
+0xfc21ffe1,
+0x001f8400,
 0x00770303,
 0x5c681000,
 0x00670202,
 0x5c681000,
 0x00570101,
 0x5c681000,
-0xfc0007e0,
+0xfde01fe1,
 0x001f8000,
 0x0047,
 0x5c681000,
diff --git a/src/shader/exacmnv110.fp b/src/shader/exacmnv110.fp
index fe5c294..d717138 100644
--- a/src/shader/exacmnv110.fp
+++ b/src/shader/exacmnv110.fp
@@ -25,23 +25,23 @@ NV110FP_Composite[] = {
 };
 #else
 
-sched (st 0x0) (st 0x0) (st 0x0)
+sched (st 0xf wr 0x0) (st 0xd wr 0x0 wt 0x1) (st 0xf wr 0x0 wt 0x1)
 ipa pass $r0 a[0x7c] 0x0 0x0 0x1
 mufu rcp $r0 $r0
 ipa $r3 a[0x94] $r0 0x0