On 05/03/2017 06:21 PM, Nicolai Hähnle wrote:
On 02.05.2017 16:43, Samuel Pitoiset wrote:
The TGSI DCE pass doesn't eliminate dead assignments like
MOV TEMP[0], TEMP[1] in presence of loops because it assumes
that the visitor doesn't emit dead code. This assumption is
actually wrong and this situation happens.

However, it appears that the merge_registers() pass accidentally
takes care of this for some weird reasons. But since this pass has
been disabled for RadeonSI and Nouveau, the renumber_registers()
pass which is called *after*, can't do its job correctly.

This is because it assumes that no dead code is present. But if
there is still a dead assignment, it might re-use the TEMP
register id incorrectly and emits wrong code.

This patches eliminates all dead assignments by tracking
all temporary register reads like what the renumber_registers()
actually does. The best solution would be to rewrite that DCE
pass entirely but it's more work.

This should fix Unigine Heaven on RadeonSI and Nouveau.

shader-db results with RadeonSI:

47109 shaders in 29632 tests
Totals:
SGPRS: 1919548 -> 1919572 (0.00 %)
VGPRS: 1139205 -> 1139209 (0.00 %)
Spilled SGPRs: 1865 -> 1867 (0.11 %)
Spilled VGPRs: 65 -> 65 (0.00 %)
Private memory VGPRs: 1184 -> 1184 (0.00 %)
Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
Code Size: 60083036 -> 60083244 (0.00 %) bytes
LDS: 1077 -> 1077 (0.00 %) blocks
Max Waves: 431200 -> 431200 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

It's still interesting to disable the merge_registers() pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 36 +++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 0f8688a41c..01b5a4dc98 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -5085,9 +5085,13 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
glsl_to_tgsi_instruction *, this->next_temp * 4);
    int *write_level = rzalloc_array(mem_ctx, int, this->next_temp * 4);
+   int *first_reads = rzalloc_array(mem_ctx, int, this->next_temp);
    int level = 0;
    int removed = 0;

+   for (int i = 0; i < this->next_temp; i++)
+      first_reads[i] = -1;

As discussed offline, it would be good to fix renumber_registers to work without this assumption.

I will have a look at this approach.


If you want to go ahead with this patch anyway / on top of this: the value of first_reads doesn't seem to matter, so why not make it an array of bool instead?

An array of bool is better.


Cheers,
Nicolai


+
foreach_in_list(glsl_to_tgsi_instruction, inst, &this->instructions) {
       assert(inst->dst[0].file != PROGRAM_TEMPORARY
              || inst->dst[0].index < this->next_temp);
@@ -5148,8 +5152,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
                src_chans |= 1 << GET_SWZ(inst->src[i].swizzle, 3);

                for (int c = 0; c < 4; c++) {
-                  if (src_chans & (1 << c))
+                  if (src_chans & (1 << c)) {
                      writes[4 * inst->src[i].index + c] = NULL;
+
+                     if (first_reads[inst->src[i].index] == -1)
+                        first_reads[inst->src[i].index] = level;
+                  }
                }
             }
          }
@@ -5167,8 +5175,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
src_chans |= 1 << GET_SWZ(inst->tex_offsets[i].swizzle, 3);

                for (int c = 0; c < 4; c++) {
-                  if (src_chans & (1 << c))
+                  if (src_chans & (1 << c)) {
                      writes[4 * inst->tex_offsets[i].index + c] = NULL;
+
+                     if (first_reads[inst->tex_offsets[i].index] == -1)
+                        first_reads[inst->tex_offsets[i].index] = level;
+                  }
                }
             }
          }
@@ -5211,17 +5223,30 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
     * the writemask of other instructions with dead channels.
     */
foreach_in_list_safe(glsl_to_tgsi_instruction, inst, &this->instructions) {
-      if (!inst->dead_mask || !inst->dst[0].writemask)
+      bool dead_inst = false;
+
+      if (!inst->dst[0].writemask)
          continue;
+
       /* No amount of dead masks should remove memory stores */
       if (inst->info->is_store)
          continue;

-      if ((inst->dst[0].writemask & ~inst->dead_mask) == 0) {
+      if (!inst->dead_mask) {
+         if (inst->dst[0].file == PROGRAM_TEMPORARY &&
+             first_reads[inst->dst[0].index] == -1) {
+            dead_inst = true;
+         }
+      } else {
+         if ((inst->dst[0].writemask & ~inst->dead_mask) == 0)
+            dead_inst = true;
+      }
+
+      if (dead_inst) {
          inst->remove();
          delete inst;
          removed++;
-      } else {
+      } else if (inst->dead_mask) {
          if (glsl_base_type_is_64bit(inst->dst[0].type)) {
             if (inst->dead_mask == WRITEMASK_XY ||
                 inst->dead_mask == WRITEMASK_ZW)
@@ -5232,6 +5257,7 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
    }

    ralloc_free(write_level);
+   ralloc_free(first_reads);
    ralloc_free(writes);

    return removed;



_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to