On Tue, Dec 17, 2024 at 04:27:57PM -0800, Jessica Zhang wrote:
> From: Abhinav Kumar <[email protected]>
> 
> There is no recovery mechanism in place yet to recover from mmu
> faults for DPU. We can only prevent the faults by making sure there
> is no misconfiguration.
> 
> Rate-limit the snapshot capture for mmu faults to once per
> msm_atomic_commit_tail() as that should be sufficient to capture
> the snapshot for debugging otherwise there will be a lot of DPU
> snapshots getting captured for the same fault which is redundant
> and also might affect capturing even one snapshot accurately.
> 
> Signed-off-by: Abhinav Kumar <[email protected]>
> Signed-off-by: Jessica Zhang <[email protected]>
> ---
>  drivers/gpu/drm/msm/msm_atomic.c | 2 ++
>  drivers/gpu/drm/msm/msm_kms.c    | 5 ++++-
>  drivers/gpu/drm/msm/msm_kms.h    | 3 +++
>  3 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_atomic.c 
> b/drivers/gpu/drm/msm/msm_atomic.c
> index 
> 9c45d641b5212c11078ab38c13a519663d85e10a..9ad7eeb14d4336abd9d8a8eb1382bdddce80508a
>  100644
> --- a/drivers/gpu/drm/msm/msm_atomic.c
> +++ b/drivers/gpu/drm/msm/msm_atomic.c
> @@ -228,6 +228,8 @@ void msm_atomic_commit_tail(struct drm_atomic_state 
> *state)
>       if (kms->funcs->prepare_commit)
>               kms->funcs->prepare_commit(kms, state);
>  
> +     kms->fault_snapshot_capture = 0;
> +

- Please move it before the prepare_commit().
- You are accessing the same variable from different threads / cores.
  There should be some kind of a sync barrier.

>       /*
>        * Push atomic updates down to hardware:
>        */
> diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c
> index 
> 78830e446355f77154fa21a5d107635bc88ba3ed..3327caf396d4fc905dc127f09515559c12666dc8
>  100644
> --- a/drivers/gpu/drm/msm/msm_kms.c
> +++ b/drivers/gpu/drm/msm/msm_kms.c
> @@ -168,7 +168,10 @@ static int msm_kms_fault_handler(void *arg, unsigned 
> long iova, int flags, void
>  {
>       struct msm_kms *kms = arg;
>  
> -     msm_disp_snapshot_state(kms->dev);
> +     if (!kms->fault_snapshot_capture) {
> +             msm_disp_snapshot_state(kms->dev);
> +             kms->fault_snapshot_capture++;
> +     }
>  
>       return -ENOSYS;
>  }
> diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h
> index 
> e60162744c669773b6e5aef824a173647626ab4e..3ac089e26e14b824567f3cd2c62f82a1b9ea9878
>  100644
> --- a/drivers/gpu/drm/msm/msm_kms.h
> +++ b/drivers/gpu/drm/msm/msm_kms.h
> @@ -128,6 +128,9 @@ struct msm_kms {
>       int irq;
>       bool irq_requested;
>  
> +     /* rate limit the snapshot capture to once per attach */
> +     int fault_snapshot_capture;
> +
>       /* mapper-id used to request GEM buffer mapped for scanout: */
>       struct msm_gem_address_space *aspace;
>  
> 
> -- 
> 2.34.1
> 

-- 
With best wishes
Dmitry

Reply via email to