Module: Mesa
Branch: main
Commit: cc7ce6c01f9221ba90ec109ce163fc27c7f665ec
URL:    
http://cgit.freedesktop.org/mesa/mesa/commit/?id=cc7ce6c01f9221ba90ec109ce163fc27c7f665ec

Author: Pavel Ondračka <pavel.ondra...@gmail.com>
Date:   Tue Jan  2 16:32:24 2024 +0100

r300: mark load_ubo_vec4 with ACCESS_CAN_SPECULATE

This is safe to do in all circumstances due to the age of the hardware.
(we don't have UBOs, just constant registers with automatic OOB checks)

R500 hardware doesn't have standard adress register in fragment shaders
and while we have the loop register which we in theory can use for indirect
access, this is currently not possible to wire through NIR. So anytime
there is an indirect uniform array access in a loop, we end with a if
ladder with size depending on the size of the uniform array. The two worst
behaving apps here are glamor and some GTK shaders, both of which are
sometimes ending over the 512 instructions limit. Flattening the if
ladders helps a LOT, so we can get into the instruction limit in most
cases (all glamor shaders are OK now). So just enable the flattening by
setting all load_ubo_vec4 with ACCESS_CAN_SPECULATE.

Shader-db RV530:
total instructions in shared programs: 128762 -> 128440 (-0.25%)
instructions in affected programs: 540 -> 218 (-59.63%)
helped: 3
HURT: 0
total temps in shared programs: 17543 -> 17550 (0.04%)
temps in affected programs: 11 -> 18 (63.64%)
helped: 0
HURT: 3
total cycles in shared programs: 196984 -> 196657 (-0.17%)
cycles in affected programs: 592 -> 265 (-55.24%)
helped: 3
HURT: 0

LOST:   0
GAINED: 7

No changes for R300/R400 because there we don't have control flow
anyway.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6366
Reviewed-by: Alyssa Rosenzweig <aly...@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26877>

---

 src/gallium/drivers/r300/compiler/r300_nir.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/src/gallium/drivers/r300/compiler/r300_nir.c 
b/src/gallium/drivers/r300/compiler/r300_nir.c
index bc52a218282..1fc240fa7b5 100644
--- a/src/gallium/drivers/r300/compiler/r300_nir.c
+++ b/src/gallium/drivers/r300/compiler/r300_nir.c
@@ -22,6 +22,7 @@
 
 #include "r300_nir.h"
 
+#include "compiler/nir/nir_builder.h"
 #include "r300_screen.h"
 
 static unsigned char
@@ -57,6 +58,16 @@ r300_should_vectorize_io(unsigned align, unsigned bit_size,
    return true;
 }
 
+static bool
+set_speculate(nir_builder *b, nir_intrinsic_instr *intr, UNUSED void *_)
+{
+   if (intr->intrinsic == nir_intrinsic_load_ubo_vec4) {
+      nir_intrinsic_set_access(intr, nir_intrinsic_access(intr) | 
ACCESS_CAN_SPECULATE);
+      return true;
+   }
+   return false;
+}
+
 static void
 r300_optimize_nir(struct nir_shader *s, struct pipe_screen *screen)
 {
@@ -86,6 +97,10 @@ r300_optimize_nir(struct nir_shader *s, struct pipe_screen 
*screen)
       NIR_PASS(progress, s, nir_opt_dead_write_vars);
 
       NIR_PASS(progress, s, nir_opt_if, nir_opt_if_optimize_phi_true_false);
+      if (is_r500)
+         nir_shader_intrinsics_pass(s, set_speculate,
+                                    nir_metadata_block_index |
+                                    nir_metadata_dominance, NULL);
       NIR_PASS(progress, s, nir_opt_peephole_select, is_r500 ? 8 : ~0, true, 
true);
       NIR_PASS(progress, s, nir_opt_algebraic);
       NIR_PASS(progress, s, nir_opt_constant_folding);

Reply via email to