On 29/05/2024 20:35, Jay Cornwall wrote:
A wavefront may deallocate its VGPRs at the end of a program while
waiting for memory transactions to complete. If it subsequently
receives a context save exception it will be unable to save,
since this requires VGPRs. In this case the trap handler should
terminate the wavefront.

Fixes intermittent VM faults under context switching load.

V2: Use S_ENDPGM instead of S_ENDPGM_SAVED for performance counters

Hi Jay,

Thanks for the V2.

FYI,as far as I can see, the .h part of the patch does not seem to apply directly on current amd-staging-drm-next, but I guess we just have a different bases.


Signed-off-by: Jay Cornwall <jay.cornw...@amd.com>
Cc: Lancelot Six <lancelot....@amd.com>
---
  .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h    | 695 +++++++++---------
  .../amd/amdkfd/cwsr_trap_handler_gfx10.asm    |  17 +
  2 files changed, 366 insertions(+), 346 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
index 18e012e04493..ac3702b8e3c4 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
@@ -97,6 +97,7 @@ var S_STATUS_HALT_MASK                                = 
SQ_WAVE_STATE_PRIV_HALT_MASK
  var S_SAVE_PC_HI_TRAP_ID_MASK                 = 0xF0000000
  #endif
+var SQ_WAVE_STATUS_NO_VGPRS_SHIFT = 24
  var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT          = 12
  var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE           = 9
  var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE          = 8
@@ -451,6 +452,22 @@ L_EXIT_TRAP:
        s_rfe_b64       [ttmp0, ttmp1]
L_SAVE:
+       // If VGPRs have been deallocated then terminate the wavefront.
+       // It has no remaining program to run and cannot save without VGPRs.
+#if ASIC_FAMILY == CHIP_PLUM_BONITO
+       s_bitcmp1_b32   s_save_status, SQ_WAVE_STATUS_NO_VGPRS_SHIFT
+       s_cbranch_scc0  L_HAVE_VGPRS
+       s_endpgm
+L_HAVE_VGPRS:
+#endif
+#if ASIC_FAMILY >= CHIP_GFX12

This is mostly cosmetic, but you could use

    #elif ASIC_FAMILY >= CHIP_GFX12

instead on #endif + #if. Those 2 blocks are not independent, they achieve the same goal for different configurations.

I do not mind if you prefer to keep this as it is. Either way, the change look good to me:

Reviewed-by: Lancelot Six <lancelot....@amd.com>

Best,
Lancelot.

+       s_getreg_b32    s_save_tmp, hwreg(HW_REG_WAVE_STATUS)
+       s_bitcmp1_b32   s_save_tmp, SQ_WAVE_STATUS_NO_VGPRS_SHIFT
+       s_cbranch_scc0  L_HAVE_VGPRS
+       s_endpgm
+L_HAVE_VGPRS:
+#endif
+
        s_and_b32       s_save_pc_hi, s_save_pc_hi, 0x0000ffff                  
//pc[47:32]
        s_mov_b32       s_save_tmp, 0
        s_setreg_b32    hwreg(S_TRAPSTS_HWREG, S_TRAPSTS_SAVE_CONTEXT_SHIFT, 
1), s_save_tmp     //clear saveCtx bit

Reply via email to