In PR83920, I encountered a nvptx bug where live predicate variables
were clobbered before their value was broadcasted. Apparently, there
were problems in certain version of the CUDA driver where the JIT would
generate wrong code for shfl broadcasts. The attached patch teaches
nvptx_single not to apply that workaround if the predicate register is live.

Tom, does this patch look sane to you? I'm not sure if it defeats the
purpose of your original patch. Regardless, the live predicate registers
shouldn't be clobbered before they are used.

Unfortunately, I cannot reproduce the runtime failure with gemm example
in the PR, so I didn't include it in the patch. However, this patch does
fix the failure with da-1.c in og7. This patch does not cause any
regressions.

Is it OK for trunk?

Thanks,
Cesar
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 55c7e3c..698c574 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3957,6 +3957,7 @@ bb_first_real_insn (basic_block bb)
 static void
 nvptx_single (unsigned mask, basic_block from, basic_block to)
 {
+  bitmap live = DF_LIVE_IN (from);
   rtx_insn *head = BB_HEAD (from);
   rtx_insn *tail = BB_END (to);
   unsigned skip_mask = mask;
@@ -4126,8 +4127,9 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
 	     There is nothing in the PTX spec to suggest that this is wrong, or
 	     to explain why the extra initialization is needed.  So, we classify
 	     it as a JIT bug, and the extra initialization as workaround.  */
-	  emit_insn_before (gen_movbi (pvar, const0_rtx),
-			    bb_first_real_insn (from));
+	  if (!bitmap_bit_p (live, REGNO (pvar)))
+	    emit_insn_before (gen_movbi (pvar, const0_rtx),
+			      bb_first_real_insn (from));
 #endif
 	  emit_insn_before (nvptx_gen_vcast (pvar), tail);
 	}

Reply via email to