Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions
On Thu, Jun 22, 2017 at 12:25 PM, Timothy Arceriwrote: > On 22/06/17 18:46, Samuel Pitoiset wrote: >> >> Here's a shader-db report: >> >> https://pastebin.com/raw/QBMnF2pv >> >> This doesn't sound like a total win actually... > > > I'm surprised to see this. I thought we pretty much unrolled everything > already, although maybe that was what happened after NIR unrolling. > > I've done a full shader-db run and I'm getting very different results from > you: > > https://pastebin.com/XRH7Vbvv > > There are still a little mixed but they look much more positive. The problem with our shader-db stats is that they have very little to do with performance changes. The code size increase is expected. Ignore register usage, instead, look at max waves. SGPR spilling is cheap - measurements can be done to see whether it's measurable. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions
On 22/06/17 18:46, Samuel Pitoiset wrote: Here's a shader-db report: https://pastebin.com/raw/QBMnF2pv This doesn't sound like a total win actually... I'm surprised to see this. I thought we pretty much unrolled everything already, although maybe that was what happened after NIR unrolling. I've done a full shader-db run and I'm getting very different results from you: https://pastebin.com/XRH7Vbvv There are still a little mixed but they look much more positive. For now, patches 1&2 are: Reviewed-by: Samuel PitoisetOn 06/21/2017 12:12 PM, Timothy Arceri wrote: The main reason these restriction exist is because glsl the loop unrolling pass is super slow with large loops. be5f27a84d0d fixed things so that expression trees were counted against the limit, however it left the limit as max_iterations * 5 which is actually fine for most shaders but probably over conservative. This change relaxes the limit to allow more loops to unroll in the Unigine Superposition benchmark. Results from Unigine Superposition @ 1920x1080 - High - Fullscreen On radeonsi (RX480) Before: Average: 28.20 Frames Per Second After: Average: 28.60 Frames Per Second --- The increase is small but it seems to be consistent, I would be interested in the results if others were interested in testing it. src/compiler/glsl/loop_unroll.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/glsl/loop_unroll.cpp b/src/compiler/glsl/loop_unroll.cpp index bc377df..64ebf0f 100644 --- a/src/compiler/glsl/loop_unroll.cpp +++ b/src/compiler/glsl/loop_unroll.cpp @@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir) /* Don't try to unroll loops that have zillions of iterations either. */ if (iterations > max_iterations) return visit_continue; /* Don't try to unroll nested loops and loops with a huge body. */ loop_unroll_count count(>body_instructions, ls, options); bool loop_too_large = - count.nested_loop || count.nodes * iterations > max_iterations * 5; + count.nested_loop || count.nodes * iterations > max_iterations * 10; if (loop_too_large && !count.unsupported_variable_indexing && !count.array_indexed_by_induction_var_with_exact_iterations) return visit_continue; /* Note: the limiting terminator contributes 1 to ls->num_loop_jumps. * We'll be removing the limiting terminator before we unroll. */ assert(ls->num_loop_jumps > 0); unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions
Here's a shader-db report: https://pastebin.com/raw/QBMnF2pv This doesn't sound like a total win actually... For now, patches 1&2 are: Reviewed-by: Samuel PitoisetOn 06/21/2017 12:12 PM, Timothy Arceri wrote: The main reason these restriction exist is because glsl the loop unrolling pass is super slow with large loops. be5f27a84d0d fixed things so that expression trees were counted against the limit, however it left the limit as max_iterations * 5 which is actually fine for most shaders but probably over conservative. This change relaxes the limit to allow more loops to unroll in the Unigine Superposition benchmark. Results from Unigine Superposition @ 1920x1080 - High - Fullscreen On radeonsi (RX480) Before: Average: 28.20 Frames Per Second After: Average: 28.60 Frames Per Second --- The increase is small but it seems to be consistent, I would be interested in the results if others were interested in testing it. src/compiler/glsl/loop_unroll.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/glsl/loop_unroll.cpp b/src/compiler/glsl/loop_unroll.cpp index bc377df..64ebf0f 100644 --- a/src/compiler/glsl/loop_unroll.cpp +++ b/src/compiler/glsl/loop_unroll.cpp @@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir) /* Don't try to unroll loops that have zillions of iterations either. */ if (iterations > max_iterations) return visit_continue; /* Don't try to unroll nested loops and loops with a huge body. */ loop_unroll_count count(>body_instructions, ls, options); bool loop_too_large = - count.nested_loop || count.nodes * iterations > max_iterations * 5; + count.nested_loop || count.nodes * iterations > max_iterations * 10; if (loop_too_large && !count.unsupported_variable_indexing && !count.array_indexed_by_induction_var_with_exact_iterations) return visit_continue; /* Note: the limiting terminator contributes 1 to ls->num_loop_jumps. * We'll be removing the limiting terminator before we unroll. */ assert(ls->num_loop_jumps > 0); unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions
On 21.06.2017 12:12, Timothy Arceri wrote: The main reason these restriction exist is because glsl the loop unrolling pass is super slow with large loops. be5f27a84d0d fixed things so that expression trees were counted against the limit, however it left the limit as max_iterations * 5 which is actually fine for most shaders but probably over conservative. This change relaxes the limit to allow more loops to unroll in the Unigine Superposition benchmark. Results from Unigine Superposition @ 1920x1080 - High - Fullscreen On radeonsi (RX480) Before: Average: 28.20 Frames Per Second After: Average: 28.60 Frames Per Second --- The increase is small but it seems to be consistent, I would be interested in the results if others were interested in testing it. Seems reasonable. Series is: Reviewed-by: Nicolai Hähnlesrc/compiler/glsl/loop_unroll.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/glsl/loop_unroll.cpp b/src/compiler/glsl/loop_unroll.cpp index bc377df..64ebf0f 100644 --- a/src/compiler/glsl/loop_unroll.cpp +++ b/src/compiler/glsl/loop_unroll.cpp @@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir) /* Don't try to unroll loops that have zillions of iterations either. */ if (iterations > max_iterations) return visit_continue; /* Don't try to unroll nested loops and loops with a huge body. */ loop_unroll_count count(>body_instructions, ls, options); bool loop_too_large = - count.nested_loop || count.nodes * iterations > max_iterations * 5; + count.nested_loop || count.nodes * iterations > max_iterations * 10; if (loop_too_large && !count.unsupported_variable_indexing && !count.array_indexed_by_induction_var_with_exact_iterations) return visit_continue; /* Note: the limiting terminator contributes 1 to ls->num_loop_jumps. * We'll be removing the limiting terminator before we unroll. */ assert(ls->num_loop_jumps > 0); unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1; -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions
On 22/06/17 01:42, Eero Tamminen wrote: Hi, On 21.06.2017 13:12, Timothy Arceri wrote: The main reason these restriction exist is because glsl the loop unrolling pass is super slow with large loops. be5f27a84d0d fixed things so that expression trees were counted against the limit, however it left the limit as max_iterations * 5 which is actually fine for most shaders but probably over conservative. This change relaxes the limit to allow more loops to unroll in the Unigine Superposition benchmark. Results from Unigine Superposition @ 1920x1080 - High - Fullscreen On radeonsi (RX480) Before: Average: 28.20 Frames Per Second After: Average: 28.60 Frames Per Second --- The increase is small but it seems to be consistent, I would be interested in the results if others were interested in testing it. Did testing on few different Intel machines with larger set of benchmarks (older than Superposition). Didn't see any statistically significant performance changes in either direction (due to large number of tests, can't run many rounds so variance is fairly high). Hi, I should have mentioned that this won't change anything for drivers that use NIR for loop unrolling such as i965. Another change you could consider for larger loops is partial unrolling. Unrolling several rounds, and then looping that. The last time I checked with shader-db the nir unrolling pass unrolled everything without hitting the equivalent limit, although someone might want to check Superposition. The limit is mainly in place due to the GLSL IR pass being slow, this is something that is not an issue with the NIR pass so it could be relaxed without much issue if needed. Thanks, Tim - Eero src/compiler/glsl/loop_unroll.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/glsl/loop_unroll.cpp b/src/compiler/glsl/loop_unroll.cpp index bc377df..64ebf0f 100644 --- a/src/compiler/glsl/loop_unroll.cpp +++ b/src/compiler/glsl/loop_unroll.cpp @@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir) /* Don't try to unroll loops that have zillions of iterations either. */ if (iterations > max_iterations) return visit_continue; /* Don't try to unroll nested loops and loops with a huge body. */ loop_unroll_count count(>body_instructions, ls, options); bool loop_too_large = - count.nested_loop || count.nodes * iterations > max_iterations * 5; + count.nested_loop || count.nodes * iterations > max_iterations * 10; if (loop_too_large && !count.unsupported_variable_indexing && !count.array_indexed_by_induction_var_with_exact_iterations) return visit_continue; /* Note: the limiting terminator contributes 1 to ls->num_loop_jumps. * We'll be removing the limiting terminator before we unroll. */ assert(ls->num_loop_jumps > 0); unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions
Hi, On 21.06.2017 13:12, Timothy Arceri wrote: The main reason these restriction exist is because glsl the loop unrolling pass is super slow with large loops. be5f27a84d0d fixed things so that expression trees were counted against the limit, however it left the limit as max_iterations * 5 which is actually fine for most shaders but probably over conservative. This change relaxes the limit to allow more loops to unroll in the Unigine Superposition benchmark. Results from Unigine Superposition @ 1920x1080 - High - Fullscreen On radeonsi (RX480) Before: Average: 28.20 Frames Per Second After: Average: 28.60 Frames Per Second --- The increase is small but it seems to be consistent, I would be interested in the results if others were interested in testing it. Did testing on few different Intel machines with larger set of benchmarks (older than Superposition). Didn't see any statistically significant performance changes in either direction (due to large number of tests, can't run many rounds so variance is fairly high). Another change you could consider for larger loops is partial unrolling. Unrolling several rounds, and then looping that. - Eero src/compiler/glsl/loop_unroll.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/glsl/loop_unroll.cpp b/src/compiler/glsl/loop_unroll.cpp index bc377df..64ebf0f 100644 --- a/src/compiler/glsl/loop_unroll.cpp +++ b/src/compiler/glsl/loop_unroll.cpp @@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir) /* Don't try to unroll loops that have zillions of iterations either. */ if (iterations > max_iterations) return visit_continue; /* Don't try to unroll nested loops and loops with a huge body. */ loop_unroll_count count(>body_instructions, ls, options); bool loop_too_large = - count.nested_loop || count.nodes * iterations > max_iterations * 5; + count.nested_loop || count.nodes * iterations > max_iterations * 10; if (loop_too_large && !count.unsupported_variable_indexing && !count.array_indexed_by_induction_var_with_exact_iterations) return visit_continue; /* Note: the limiting terminator contributes 1 to ls->num_loop_jumps. * We'll be removing the limiting terminator before we unroll. */ assert(ls->num_loop_jumps > 0); unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions
Perhaps someone would comment on the last patch, but makes sense to me. The three patch series is: Reviewed-by: Alejandro PiñeiroOn 21/06/17 12:12, Timothy Arceri wrote: > The main reason these restriction exist is because glsl the loop > unrolling pass is super slow with large loops. > > be5f27a84d0d fixed things so that expression trees were counted > against the limit, however it left the limit as > max_iterations * 5 which is actually fine for most shaders but > probably over conservative. > > This change relaxes the limit to allow more loops to unroll in the > Unigine Superposition benchmark. > > Results from Unigine Superposition @ 1920x1080 - High - Fullscreen > On radeonsi (RX480) > > Before: > > Average: 28.20 Frames Per Second > > After: > > Average: 28.60 Frames Per Second > --- > > The increase is small but it seems to be consistent, I would be > interested in the results if others were interested in testing it. > > src/compiler/glsl/loop_unroll.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/compiler/glsl/loop_unroll.cpp > b/src/compiler/glsl/loop_unroll.cpp > index bc377df..64ebf0f 100644 > --- a/src/compiler/glsl/loop_unroll.cpp > +++ b/src/compiler/glsl/loop_unroll.cpp > @@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir) > /* Don't try to unroll loops that have zillions of iterations either. > */ > if (iterations > max_iterations) >return visit_continue; > > /* Don't try to unroll nested loops and loops with a huge body. > */ > loop_unroll_count count(>body_instructions, ls, options); > > bool loop_too_large = > - count.nested_loop || count.nodes * iterations > max_iterations * 5; > + count.nested_loop || count.nodes * iterations > max_iterations * 10; > > if (loop_too_large && !count.unsupported_variable_indexing && > !count.array_indexed_by_induction_var_with_exact_iterations) >return visit_continue; > > /* Note: the limiting terminator contributes 1 to ls->num_loop_jumps. > * We'll be removing the limiting terminator before we unroll. > */ > assert(ls->num_loop_jumps > 0); > unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] glsl: relax loop unrolling restrictions
The main reason these restriction exist is because glsl the loop unrolling pass is super slow with large loops. be5f27a84d0d fixed things so that expression trees were counted against the limit, however it left the limit as max_iterations * 5 which is actually fine for most shaders but probably over conservative. This change relaxes the limit to allow more loops to unroll in the Unigine Superposition benchmark. Results from Unigine Superposition @ 1920x1080 - High - Fullscreen On radeonsi (RX480) Before: Average: 28.20 Frames Per Second After: Average: 28.60 Frames Per Second --- The increase is small but it seems to be consistent, I would be interested in the results if others were interested in testing it. src/compiler/glsl/loop_unroll.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/glsl/loop_unroll.cpp b/src/compiler/glsl/loop_unroll.cpp index bc377df..64ebf0f 100644 --- a/src/compiler/glsl/loop_unroll.cpp +++ b/src/compiler/glsl/loop_unroll.cpp @@ -350,21 +350,21 @@ loop_unroll_visitor::visit_leave(ir_loop *ir) /* Don't try to unroll loops that have zillions of iterations either. */ if (iterations > max_iterations) return visit_continue; /* Don't try to unroll nested loops and loops with a huge body. */ loop_unroll_count count(>body_instructions, ls, options); bool loop_too_large = - count.nested_loop || count.nodes * iterations > max_iterations * 5; + count.nested_loop || count.nodes * iterations > max_iterations * 10; if (loop_too_large && !count.unsupported_variable_indexing && !count.array_indexed_by_induction_var_with_exact_iterations) return visit_continue; /* Note: the limiting terminator contributes 1 to ls->num_loop_jumps. * We'll be removing the limiting terminator before we unroll. */ assert(ls->num_loop_jumps > 0); unsigned predicted_num_loop_jumps = ls->num_loop_jumps - 1; -- 2.9.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev