On 04/15/2013 11:22 PM, Martin Andersson wrote:
There is a hardware bug on Cayman where a BREAK/CONTINUE followed by
LOOP_STARTxxx for nested loops may put the branch stack into a state
such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this
by replacing the ALU_PUSH_BEFORE with a PUSH + ALU

Fixes piglit tests EXT_transform_feedback/order*

v2: Use existing loop count and improve comment
---
  src/gallium/drivers/r600/r600_shader.c | 17 ++++++++++++++---
  1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 6dbca50..f4398fd 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -5490,7 +5490,7 @@ static int tgsi_opdst(struct r600_shader_ctx *ctx)
        return 0;
  }

-static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode)
+static int emit_logic_pred(struct r600_shader_ctx *ctx, int opcode, int 
alu_type)
  {
        struct r600_bytecode_alu alu;
        int r;
@@ -5510,7 +5510,7 @@ static int emit_logic_pred(struct r600_shader_ctx *ctx, 
int opcode)

        alu.last = 1;

-       r = r600_bytecode_add_alu_type(ctx->bc, &alu, CF_OP_ALU_PUSH_BEFORE);
+       r = r600_bytecode_add_alu_type(ctx->bc, &alu, alu_type);
        if (r)
                return r;
        return 0;
@@ -5730,7 +5730,18 @@ static void break_loop_on_flag(struct r600_shader_ctx 
*ctx, unsigned fc_sp)

  static int tgsi_if(struct r600_shader_ctx *ctx)
  {
-       emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT);
+       int alu_type = CF_OP_ALU_PUSH_BEFORE;
+
+       /* There is a hardware bug on Cayman where a BREAK/CONTINUE followed by
+        * LOOP_STARTxxx for nested loops may put the branch stack into a state
+        * such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this
+        * by replacing the ALU_PUSH_BEFORE with a PUSH + ALU */
+       if (ctx->bc->chip_class == CAYMAN && ctx->bc->stack.loop > 1) {
+               r600_bytecode_add_cfinst(ctx->bc, CF_OP_PUSH);

Oh, it seems I overlooked potential issue here: jump address for PUSH is not set properly, so I guess there will be GPU lockups in case of a jump. Ideally we could set it to jump over the whole IF-ENDIF block if there are no active threads, but I think it's a rare case, so simplest fix is to avoid computation of the address and set jump address for PUSH to the next instruction, like this:

                ctx->bc->cf_last->cf_addr = ctx->bc->cf_last->id + 2;

We can improve it later but anyway ALU_PUSH_BEFORE never jumped at all so I think at least we won't have any serious performance regressions.

Everything else looks ok, so I think I'll commit your patch with this change soon if there are no objections.

Vadim

+               alu_type = CF_OP_ALU;
+       }
+
+       emit_logic_pred(ctx, ALU_OP2_PRED_SETNE_INT, alu_type);

        r600_bytecode_add_cfinst(ctx->bc, CF_OP_JUMP);



_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to