Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions

2017-06-26 Thread Marek Olšák
On Thu, Jun 22, 2017 at 12:25 PM, Timothy Arceri  wrote:
> On 22/06/17 18:46, Samuel Pitoiset wrote:
>>
>> Here's a shader-db report:
>>
>> https://pastebin.com/raw/QBMnF2pv
>>
>> This doesn't sound like a total win actually...
>
>
> I'm surprised to see this. I thought we pretty much unrolled everything
> already, although maybe that was what happened after NIR unrolling.
>
> I've done a full shader-db run and I'm getting very different results from
> you:
>
> https://pastebin.com/XRH7Vbvv
>
> There are still a little mixed but they look much more positive.

The problem with our shader-db stats is that they have very little to do
with performance changes. The code size increase is expected. Ignore
register usage, instead, look at max waves. SGPR spilling is cheap -
measurements can be done to see whether it's measurable.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions

2017-06-22 Thread Timothy Arceri

On 22/06/17 18:46, Samuel Pitoiset wrote:

Here's a shader-db report:

https://pastebin.com/raw/QBMnF2pv

This doesn't sound like a total win actually...


I'm surprised to see this. I thought we pretty much unrolled everything 
already, although maybe that was what happened after NIR unrolling.


I've done a full shader-db run and I'm getting very different results 
from you:


https://pastebin.com/XRH7Vbvv

There are still a little mixed but they look much more positive.




For now, patches 1&2 are:

Reviewed-by: Samuel Pitoiset 

On 06/21/2017 12:12 PM, Timothy Arceri wrote:

The main reason these restriction exist is because glsl the loop
unrolling pass is super slow with large loops.

be5f27a84d0d fixed things so that expression trees were counted
against the limit, however it left the limit as
max_iterations * 5 which is actually fine for most shaders but
probably over conservative.

This change relaxes the limit to allow more loops to unroll in the
Unigine Superposition benchmark.

Results from Unigine Superposition @ 1920x1080 - High - Fullscreen
On radeonsi (RX480)

Before:

Average: 28.20 Frames Per Second

After:

Average: 28.60 Frames Per Second
---

  The increase is small but it seems to be consistent, I would be
  interested in the results if others were interested in testing it.

  src/compiler/glsl/loop_unroll.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/glsl/loop_unroll.cpp 
b/src/compiler/glsl/loop_unroll.cpp

index bc377df..64ebf0f 100644
--- a/src/compiler/glsl/loop_unroll.cpp
+++ b/src/compiler/glsl/loop_unroll.cpp
@@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir)
 /* Don't try to unroll loops that have zillions of iterations 
either.

  */
 if (iterations > max_iterations)
return visit_continue;
 /* Don't try to unroll nested loops and loops with a huge body.
  */
 loop_unroll_count count(>body_instructions, ls, options);
 bool loop_too_large =
-  count.nested_loop || count.nodes * iterations > max_iterations 
* 5;
+  count.nested_loop || count.nodes * iterations > max_iterations 
* 10;

 if (loop_too_large && !count.unsupported_variable_indexing &&
 !count.array_indexed_by_induction_var_with_exact_iterations)
return visit_continue;
 /* Note: the limiting terminator contributes 1 to 
ls->num_loop_jumps.

  * We'll be removing the limiting terminator before we unroll.
  */
 assert(ls->num_loop_jumps > 0);
 unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions

2017-06-22 Thread Samuel Pitoiset

Here's a shader-db report:

https://pastebin.com/raw/QBMnF2pv

This doesn't sound like a total win actually...

For now, patches 1&2 are:

Reviewed-by: Samuel Pitoiset 

On 06/21/2017 12:12 PM, Timothy Arceri wrote:

The main reason these restriction exist is because glsl the loop
unrolling pass is super slow with large loops.

be5f27a84d0d fixed things so that expression trees were counted
against the limit, however it left the limit as
max_iterations * 5 which is actually fine for most shaders but
probably over conservative.

This change relaxes the limit to allow more loops to unroll in the
Unigine Superposition benchmark.

Results from Unigine Superposition @ 1920x1080 - High - Fullscreen
On radeonsi (RX480)

Before:

Average: 28.20 Frames Per Second

After:

Average: 28.60 Frames Per Second
---

  The increase is small but it seems to be consistent, I would be
  interested in the results if others were interested in testing it.

  src/compiler/glsl/loop_unroll.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/glsl/loop_unroll.cpp 
b/src/compiler/glsl/loop_unroll.cpp
index bc377df..64ebf0f 100644
--- a/src/compiler/glsl/loop_unroll.cpp
+++ b/src/compiler/glsl/loop_unroll.cpp
@@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir)
 /* Don't try to unroll loops that have zillions of iterations either.
  */
 if (iterations > max_iterations)
return visit_continue;
  
 /* Don't try to unroll nested loops and loops with a huge body.

  */
 loop_unroll_count count(>body_instructions, ls, options);
  
 bool loop_too_large =

-  count.nested_loop || count.nodes * iterations > max_iterations * 5;
+  count.nested_loop || count.nodes * iterations > max_iterations * 10;
  
 if (loop_too_large && !count.unsupported_variable_indexing &&

 !count.array_indexed_by_induction_var_with_exact_iterations)
return visit_continue;
  
 /* Note: the limiting terminator contributes 1 to ls->num_loop_jumps.

  * We'll be removing the limiting terminator before we unroll.
  */
 assert(ls->num_loop_jumps > 0);
 unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions

2017-06-22 Thread Nicolai Hähnle

On 21.06.2017 12:12, Timothy Arceri wrote:

The main reason these restriction exist is because glsl the loop
unrolling pass is super slow with large loops.

be5f27a84d0d fixed things so that expression trees were counted
against the limit, however it left the limit as
max_iterations * 5 which is actually fine for most shaders but
probably over conservative.

This change relaxes the limit to allow more loops to unroll in the
Unigine Superposition benchmark.

Results from Unigine Superposition @ 1920x1080 - High - Fullscreen
On radeonsi (RX480)

Before:

Average: 28.20 Frames Per Second

After:

Average: 28.60 Frames Per Second
---

  The increase is small but it seems to be consistent, I would be
  interested in the results if others were interested in testing it.


Seems reasonable. Series is:

Reviewed-by: Nicolai Hähnle 




  src/compiler/glsl/loop_unroll.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/glsl/loop_unroll.cpp 
b/src/compiler/glsl/loop_unroll.cpp
index bc377df..64ebf0f 100644
--- a/src/compiler/glsl/loop_unroll.cpp
+++ b/src/compiler/glsl/loop_unroll.cpp
@@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir)
 /* Don't try to unroll loops that have zillions of iterations either.
  */
 if (iterations > max_iterations)
return visit_continue;
  
 /* Don't try to unroll nested loops and loops with a huge body.

  */
 loop_unroll_count count(>body_instructions, ls, options);
  
 bool loop_too_large =

-  count.nested_loop || count.nodes * iterations > max_iterations * 5;
+  count.nested_loop || count.nodes * iterations > max_iterations * 10;
  
 if (loop_too_large && !count.unsupported_variable_indexing &&

 !count.array_indexed_by_induction_var_with_exact_iterations)
return visit_continue;
  
 /* Note: the limiting terminator contributes 1 to ls->num_loop_jumps.

  * We'll be removing the limiting terminator before we unroll.
  */
 assert(ls->num_loop_jumps > 0);
 unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1;




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions

2017-06-21 Thread Timothy Arceri

On 22/06/17 01:42, Eero Tamminen wrote:

Hi,

On 21.06.2017 13:12, Timothy Arceri wrote:

The main reason these restriction exist is because glsl the loop
unrolling pass is super slow with large loops.

be5f27a84d0d fixed things so that expression trees were counted
against the limit, however it left the limit as
max_iterations * 5 which is actually fine for most shaders but
probably over conservative.

This change relaxes the limit to allow more loops to unroll in the
Unigine Superposition benchmark.

Results from Unigine Superposition @ 1920x1080 - High - Fullscreen
On radeonsi (RX480)

Before:

Average: 28.20 Frames Per Second

After:

Average: 28.60 Frames Per Second
---

 The increase is small but it seems to be consistent, I would be
 interested in the results if others were interested in testing it.


Did testing on few different Intel machines with larger set of 
benchmarks (older than Superposition).


Didn't see any statistically significant performance changes in either 
direction (due to large number of tests, can't run many rounds so 
variance is fairly high).


Hi,

I should have mentioned that this won't change anything for drivers that 
use NIR for loop unrolling such as i965.





Another change you could consider for larger loops is partial unrolling. 
  Unrolling several rounds, and then looping that.


The last time I checked with shader-db the nir unrolling pass unrolled 
everything without hitting the equivalent limit, although someone might 
want to check Superposition.


The limit is mainly in place due to the GLSL IR pass being slow, this is 
something that is not an issue with the NIR pass so it could be relaxed 
without much issue if needed.


Thanks,
Tim




 - Eero


 src/compiler/glsl/loop_unroll.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/glsl/loop_unroll.cpp 
b/src/compiler/glsl/loop_unroll.cpp

index bc377df..64ebf0f 100644
--- a/src/compiler/glsl/loop_unroll.cpp
+++ b/src/compiler/glsl/loop_unroll.cpp
@@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir)
/* Don't try to unroll loops that have zillions of iterations either.
 */
if (iterations > max_iterations)
   return visit_continue;

/* Don't try to unroll nested loops and loops with a huge body.
 */
loop_unroll_count count(>body_instructions, ls, options);

bool loop_too_large =
-  count.nested_loop || count.nodes * iterations > max_iterations 
* 5;
+  count.nested_loop || count.nodes * iterations > max_iterations 
* 10;


if (loop_too_large && !count.unsupported_variable_indexing &&
!count.array_indexed_by_induction_var_with_exact_iterations)
   return visit_continue;

/* Note: the limiting terminator contributes 1 to ls->num_loop_jumps.
 * We'll be removing the limiting terminator before we unroll.
 */
assert(ls->num_loop_jumps > 0);
unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions

2017-06-21 Thread Eero Tamminen

Hi,

On 21.06.2017 13:12, Timothy Arceri wrote:

The main reason these restriction exist is because glsl the loop
unrolling pass is super slow with large loops.

be5f27a84d0d fixed things so that expression trees were counted
against the limit, however it left the limit as
max_iterations * 5 which is actually fine for most shaders but
probably over conservative.

This change relaxes the limit to allow more loops to unroll in the
Unigine Superposition benchmark.

Results from Unigine Superposition @ 1920x1080 - High - Fullscreen
On radeonsi (RX480)

Before:

Average: 28.20 Frames Per Second

After:

Average: 28.60 Frames Per Second
---

 The increase is small but it seems to be consistent, I would be
 interested in the results if others were interested in testing it.


Did testing on few different Intel machines with larger set of 
benchmarks (older than Superposition).


Didn't see any statistically significant performance changes in either 
direction (due to large number of tests, can't run many rounds so 
variance is fairly high).



Another change you could consider for larger loops is partial unrolling. 
 Unrolling several rounds, and then looping that.



- Eero


 src/compiler/glsl/loop_unroll.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/glsl/loop_unroll.cpp 
b/src/compiler/glsl/loop_unroll.cpp
index bc377df..64ebf0f 100644
--- a/src/compiler/glsl/loop_unroll.cpp
+++ b/src/compiler/glsl/loop_unroll.cpp
@@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir)
/* Don't try to unroll loops that have zillions of iterations either.
 */
if (iterations > max_iterations)
   return visit_continue;

/* Don't try to unroll nested loops and loops with a huge body.
 */
loop_unroll_count count(>body_instructions, ls, options);

bool loop_too_large =
-  count.nested_loop || count.nodes * iterations > max_iterations * 5;
+  count.nested_loop || count.nodes * iterations > max_iterations * 10;

if (loop_too_large && !count.unsupported_variable_indexing &&
!count.array_indexed_by_induction_var_with_exact_iterations)
   return visit_continue;

/* Note: the limiting terminator contributes 1 to ls->num_loop_jumps.
 * We'll be removing the limiting terminator before we unroll.
 */
assert(ls->num_loop_jumps > 0);
unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions

2017-06-21 Thread Alejandro Piñeiro
Perhaps someone would comment on the last patch, but makes sense to me.

The three patch series is:
Reviewed-by: Alejandro Piñeiro 

On 21/06/17 12:12, Timothy Arceri wrote:
> The main reason these restriction exist is because glsl the loop
> unrolling pass is super slow with large loops.
>
> be5f27a84d0d fixed things so that expression trees were counted
> against the limit, however it left the limit as
> max_iterations * 5 which is actually fine for most shaders but
> probably over conservative.
>
> This change relaxes the limit to allow more loops to unroll in the
> Unigine Superposition benchmark.
>
> Results from Unigine Superposition @ 1920x1080 - High - Fullscreen
> On radeonsi (RX480)
>
> Before:
>
> Average: 28.20 Frames Per Second
>
> After:
>
> Average: 28.60 Frames Per Second
> ---
>
>  The increase is small but it seems to be consistent, I would be
>  interested in the results if others were interested in testing it.
>
>  src/compiler/glsl/loop_unroll.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/compiler/glsl/loop_unroll.cpp 
> b/src/compiler/glsl/loop_unroll.cpp
> index bc377df..64ebf0f 100644
> --- a/src/compiler/glsl/loop_unroll.cpp
> +++ b/src/compiler/glsl/loop_unroll.cpp
> @@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir)
> /* Don't try to unroll loops that have zillions of iterations either.
>  */
> if (iterations > max_iterations)
>return visit_continue;
>  
> /* Don't try to unroll nested loops and loops with a huge body.
>  */
> loop_unroll_count count(>body_instructions, ls, options);
>  
> bool loop_too_large =
> -  count.nested_loop || count.nodes * iterations > max_iterations * 5;
> +  count.nested_loop || count.nodes * iterations > max_iterations * 10;
>  
> if (loop_too_large && !count.unsupported_variable_indexing &&
> !count.array_indexed_by_induction_var_with_exact_iterations)
>return visit_continue;
>  
> /* Note: the limiting terminator contributes 1 to ls->num_loop_jumps.
>  * We'll be removing the limiting terminator before we unroll.
>  */
> assert(ls->num_loop_jumps > 0);
> unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1;

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions

2017-06-21 Thread Timothy Arceri
The main reason these restriction exist is because glsl the loop
unrolling pass is super slow with large loops.

be5f27a84d0d fixed things so that expression trees were counted
against the limit, however it left the limit as
max_iterations * 5 which is actually fine for most shaders but
probably over conservative.

This change relaxes the limit to allow more loops to unroll in the
Unigine Superposition benchmark.

Results from Unigine Superposition @ 1920x1080 - High - Fullscreen
On radeonsi (RX480)

Before:

Average: 28.20 Frames Per Second

After:

Average: 28.60 Frames Per Second
---

 The increase is small but it seems to be consistent, I would be
 interested in the results if others were interested in testing it.

 src/compiler/glsl/loop_unroll.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/glsl/loop_unroll.cpp 
b/src/compiler/glsl/loop_unroll.cpp
index bc377df..64ebf0f 100644
--- a/src/compiler/glsl/loop_unroll.cpp
+++ b/src/compiler/glsl/loop_unroll.cpp
@@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir)
/* Don't try to unroll loops that have zillions of iterations either.
 */
if (iterations > max_iterations)
   return visit_continue;
 
/* Don't try to unroll nested loops and loops with a huge body.
 */
loop_unroll_count count(>body_instructions, ls, options);
 
bool loop_too_large =
-  count.nested_loop || count.nodes * iterations > max_iterations * 5;
+  count.nested_loop || count.nodes * iterations > max_iterations * 10;
 
if (loop_too_large && !count.unsupported_variable_indexing &&
!count.array_indexed_by_induction_var_with_exact_iterations)
   return visit_continue;
 
/* Note: the limiting terminator contributes 1 to ls->num_loop_jumps.
 * We'll be removing the limiting terminator before we unroll.
 */
assert(ls->num_loop_jumps > 0);
unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1;
-- 
2.9.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev