Hi  Christian

Can you help answer the questions below? Thanks in advance.
-----Original Message-----
From: Koenig, Christian <christian.koe...@amd.com> 
Sent: 2020年4月22日 19:03
To: Tao, Yintian <yintian....@amd.com>; Liu, Monk <monk....@amd.com>; Kuehling, 
Felix <felix.kuehl...@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: refine kiq access register

Am 22.04.20 um 11:29 schrieb Yintian Tao:
> According to the current kiq access register method, there will be 
> race condition when using KIQ to read register if multiple clients 
> want to read at same time just like the expample below:
> 1. client-A start to read REG-0 throguh KIQ 2. client-A poll the 
> seqno-0 3. client-B start to read REG-1 through KIQ 4. client-B poll 
> the seqno-1 5. the kiq complete these two read operation 6. client-A 
> to read the register at the wb buffer and
>     get REG-1 value
>
> And if there are multiple clients to frequently write registers 
> through KIQ which may raise the KIQ ring buffer overwritten problem.
>
> Therefore, allocate fixed number wb slot for rreg use and limit the 
> submit number which depends on the kiq ring_size in order to prevent 
> the overwritten problem.
>
> v2: directly use amdgpu_device_wb_get() for each read instead
>      of to reserve fixde number slot.
>      if there is no enough kiq ring buffer or rreg slot then
>      directly print error log and return instead of busy waiting

I would split that into three patches. One for each problem we have here:

1. Fix kgd_hiq_mqd_load() and maybe other occasions to use spin_lock_irqsave().
[yttao]: Do you mean that we need to use spin_lock_irqsave for the functions 
just like kgd_hiq_mqd_load()?

2. Prevent the overrung of the KIQ. Please drop the approach with the atomic 
here. Instead just add a amdgpu_fence_wait_polling() into
amdgpu_fence_emit_polling() as I discussed with Monk.
[yttao]: Sorry, I can't get your original idea for the 
amdgpu_fence_wait_polling(). Can you give more details about it? Thanks in 
advance.

"That is actually only a problem because the KIQ uses polling waits.

See amdgpu_fence_emit() waits for the oldest possible fence to be signaled 
before emitting a new one.

I suggest that we do the same in amdgpu_fence_emit_polling(). A one liner like 
the following should be enough:

amdgpu_fence_wait_polling(ring, seq - ring->fence_drv.num_fences_mask, 
timeout);"
[yttao]: there is no usage of num_fences_mask at kiq fence polling, the 
num_fences_mask is only effective at dma_fence architecture.
                If I understand correctly, do you want the protype code below? 
If the protype code is wrong, can you help give one sample? Thanks in advance.

int amdgpu_fence_emit_polling(struct amdgpu_ring *ring, uint32_t *s) 
{
        uint32_t seq;

        if (!s)
                return -EINVAL;
+               amdgpu_fence_wait_polling(ring, seq, timeout); 
        seq = ++ring->fence_drv.sync_seq;
        amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
                        ¦      seq, 0); 

        *s = seq;

        return 0;
}




3. Use amdgpu_device_wb_get() each time we need to submit a read.
[yttao]: yes, I will do it.

Regards,
Christian.

>
> Signed-off-by: Yintian Tao <yt...@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  8 +-
>   .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c    | 13 ++-
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 13 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c       | 83 +++++++++++++++----
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h       |  3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h      |  5 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c      | 13 ++-
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c        |  8 +-
>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c         |  8 +-
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c         | 35 +++++---
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        | 13 ++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         | 13 ++-
>   12 files changed, 167 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 4e1d4cfe7a9f..1157c1a0b888 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -526,7 +526,7 @@ static inline void amdgpu_set_ib_value(struct 
> amdgpu_cs_parser *p,
>   /*
>    * Writeback
>    */
> -#define AMDGPU_MAX_WB 128    /* Reserve at most 128 WB slots for 
> amdgpu-owned rings. */
> +#define AMDGPU_MAX_WB 256    /* Reserve at most 256 WB slots for 
> amdgpu-owned rings. */
>   
>   struct amdgpu_wb {
>       struct amdgpu_bo        *wb_obj;
> @@ -1028,6 +1028,12 @@ bool amdgpu_device_has_dc_support(struct 
> amdgpu_device *adev);
>   
>   int emu_soc_asic_init(struct amdgpu_device *adev);
>   
> +int amdgpu_gfx_kiq_lock(struct amdgpu_kiq *kiq, bool read,
> +                     unsigned long *flags);
> +void amdgpu_gfx_kiq_unlock(struct amdgpu_kiq *kiq, unsigned long 
> +*flags);
> +
> +void amdgpu_gfx_kiq_consume(struct amdgpu_kiq *kiq, uint32_t *offs); 
> +void amdgpu_gfx_kiq_restore(struct amdgpu_kiq *kiq, uint32_t *offs);
>   /*
>    * Registers read & write functions.
>    */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> index 691c89705bcd..a65d6a1abc04 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> @@ -309,9 +309,11 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void 
> *mqd,
>                           uint32_t doorbell_off)
>   {
>       struct amdgpu_device *adev = get_amdgpu_device(kgd);
> +     struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>       struct amdgpu_ring *kiq_ring = &adev->gfx.kiq.ring;
>       struct v10_compute_mqd *m;
>       uint32_t mec, pipe;
> +     unsigned long flags = 0;
>       int r;
>   
>       m = get_mqd(mqd);
> @@ -324,13 +326,19 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void 
> *mqd,
>       pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n",
>                mec, pipe, queue_id);
>   
> -     spin_lock(&adev->gfx.kiq.ring_lock);
> +     r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> +     if (r) {
> +             pr_err("failed to lock kiq\n");
> +             goto out_unlock;
> +     }
> +
>       r = amdgpu_ring_alloc(kiq_ring, 7);
>       if (r) {
>               pr_err("Failed to alloc KIQ (%d).\n", r);
>               goto out_unlock;
>       }
>   
> +     amdgpu_gfx_kiq_consume(kiq, NULL);
>       amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_MAP_QUEUES, 5));
>       amdgpu_ring_write(kiq_ring,
>                         PACKET3_MAP_QUEUES_QUEUE_SEL(0) | /* Queue_Sel */ @@ 
> -350,8 
> +358,9 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
>       amdgpu_ring_write(kiq_ring, m->cp_hqd_pq_wptr_poll_addr_hi);
>       amdgpu_ring_commit(kiq_ring);
>   
> +     amdgpu_gfx_kiq_restore(kiq, NULL);
>   out_unlock:
> -     spin_unlock(&adev->gfx.kiq.ring_lock);
> +     amdgpu_gfx_kiq_unlock(&adev->gfx.kiq, &flags);
>       release_queue(kgd);
>   
>       return r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> index df841c2ac5e7..4435bd716edd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> @@ -307,9 +307,11 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void 
> *mqd,
>                           uint32_t doorbell_off)
>   {
>       struct amdgpu_device *adev = get_amdgpu_device(kgd);
> +     struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>       struct amdgpu_ring *kiq_ring = &adev->gfx.kiq.ring;
>       struct v9_mqd *m;
>       uint32_t mec, pipe;
> +     unsigned long flags = 0;
>       int r;
>   
>       m = get_mqd(mqd);
> @@ -322,13 +324,19 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void 
> *mqd,
>       pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n",
>                mec, pipe, queue_id);
>   
> -     spin_lock(&adev->gfx.kiq.ring_lock);
> +     r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> +     if (r) {
> +             pr_err("failed to lock kiq\n");
> +             goto out_unlock;
> +     }
> +
>       r = amdgpu_ring_alloc(kiq_ring, 7);
>       if (r) {
>               pr_err("Failed to alloc KIQ (%d).\n", r);
>               goto out_unlock;
>       }
>   
> +     amdgpu_gfx_kiq_consume(kiq, NULL);
>       amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_MAP_QUEUES, 5));
>       amdgpu_ring_write(kiq_ring,
>                         PACKET3_MAP_QUEUES_QUEUE_SEL(0) | /* Queue_Sel */ @@ 
> -348,8 
> +356,9 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void *mqd,
>       amdgpu_ring_write(kiq_ring, m->cp_hqd_pq_wptr_poll_addr_hi);
>       amdgpu_ring_commit(kiq_ring);
>   
> +     amdgpu_gfx_kiq_restore(kiq, NULL);
>   out_unlock:
> -     spin_unlock(&adev->gfx.kiq.ring_lock);
> +     amdgpu_gfx_kiq_unlock(&adev->gfx.kiq, &flags);
>       release_queue(kgd);
>   
>       return r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index ea576b4260a4..c0dc7f1849c4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -295,6 +295,43 @@ static int amdgpu_gfx_kiq_acquire(struct amdgpu_device 
> *adev,
>       return -EINVAL;
>   }
>   
> +int amdgpu_gfx_kiq_lock(struct amdgpu_kiq *kiq, bool read,
> +                     unsigned long *flags)
> +{
> +     struct amdgpu_wb *wb = &(kiq->ring.adev)->wb;
> +
> +     spin_lock_irqsave(&kiq->ring_lock, *flags);
> +     if ((atomic64_read(&kiq->submit_avail) > 0) &&
> +         (read ? find_first_zero_bit(wb->used, wb->num_wb) <
> +          wb->num_wb : 1)) {
> +             return 0;
> +     } else {
> +             spin_unlock_irqrestore(&kiq->ring_lock, *flags);
> +             pr_err("critical! too more kiq accessers\n");
> +             return -EDEADLK;
> +     }
> +}
> +
> +void amdgpu_gfx_kiq_unlock(struct amdgpu_kiq *kiq, unsigned long 
> +*flags) {
> +     spin_unlock_irqrestore(&kiq->ring_lock, *flags); }
> +
> +void amdgpu_gfx_kiq_consume(struct amdgpu_kiq *kiq, uint32_t *offs) {
> +     atomic64_dec(&kiq->submit_avail);
> +     if (offs)
> +             amdgpu_device_wb_get(kiq->ring.adev, offs);
> +
> +}
> +
> +void amdgpu_gfx_kiq_restore(struct amdgpu_kiq *kiq, uint32_t *offs) {
> +     atomic64_inc(&kiq->submit_avail);
> +     if (offs)
> +             amdgpu_device_wb_free(kiq->ring.adev, *offs); }
> +
>   int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
>                            struct amdgpu_ring *ring,
>                            struct amdgpu_irq_src *irq)
> @@ -304,10 +341,6 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device 
> *adev,
>   
>       spin_lock_init(&kiq->ring_lock);
>   
> -     r = amdgpu_device_wb_get(adev, &kiq->reg_val_offs);
> -     if (r)
> -             return r;
> -
>       ring->adev = NULL;
>       ring->ring_obj = NULL;
>       ring->use_doorbell = true;
> @@ -325,13 +358,15 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
>                            AMDGPU_RING_PRIO_DEFAULT);
>       if (r)
>               dev_warn(adev->dev, "(%d) failed to init kiq ring\n", r);
> +     else
> +             atomic64_set(&kiq->submit_avail, ring->ring_size / 4 /
> +                          (ring->funcs->align_mask + 1));
>   
>       return r;
>   }
>   
>   void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring)
>   {
> -     amdgpu_device_wb_free(ring->adev, ring->adev->gfx.kiq.reg_val_offs);
>       amdgpu_ring_fini(ring);
>   }
>   
> @@ -671,19 +706,25 @@ int amdgpu_gfx_cp_ecc_error_irq(struct amdgpu_device 
> *adev,
>   uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)
>   {
>       signed long r, cnt = 0;
> -     unsigned long flags;
> -     uint32_t seq;
> +     unsigned long flags = 0;
> +     uint32_t seq, reg_val_offs = 0, value = 0;
>       struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>       struct amdgpu_ring *ring = &kiq->ring;
>   
>       BUG_ON(!ring->funcs->emit_rreg);
>   
> -     spin_lock_irqsave(&kiq->ring_lock, flags);
> +     r = amdgpu_gfx_kiq_lock(kiq, true, &flags);
> +     if (r) {
> +             pr_err("failed to lock kiq\n");
> +             goto kiq_read_exit;
> +     }
> +
> +     amdgpu_gfx_kiq_consume(kiq, &reg_val_offs);
>       amdgpu_ring_alloc(ring, 32);
> -     amdgpu_ring_emit_rreg(ring, reg);
> +     amdgpu_ring_emit_rreg(ring, reg, reg_val_offs);
>       amdgpu_fence_emit_polling(ring, &seq);
>       amdgpu_ring_commit(ring);
> -     spin_unlock_irqrestore(&kiq->ring_lock, flags);
> +     amdgpu_gfx_kiq_unlock(kiq, &flags);
>   
>       r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>   
> @@ -707,9 +748,14 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, 
> uint32_t reg)
>       if (cnt > MAX_KIQ_REG_TRY)
>               goto failed_kiq_read;
>   
> -     return adev->wb.wb[kiq->reg_val_offs];
> +     mb();
> +     value = adev->wb.wb[reg_val_offs];
> +     amdgpu_gfx_kiq_restore(kiq, &reg_val_offs);
> +     return value;
>   
>   failed_kiq_read:
> +     amdgpu_gfx_kiq_restore(kiq, &reg_val_offs);
> +kiq_read_exit:
>       pr_err("failed to read reg:%x\n", reg);
>       return ~0;
>   }
> @@ -717,19 +763,25 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, 
> uint32_t reg)
>   void amdgpu_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v)
>   {
>       signed long r, cnt = 0;
> -     unsigned long flags;
> +     unsigned long flags = 0;
>       uint32_t seq;
>       struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>       struct amdgpu_ring *ring = &kiq->ring;
>   
>       BUG_ON(!ring->funcs->emit_wreg);
>   
> -     spin_lock_irqsave(&kiq->ring_lock, flags);
> +     r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> +     if (r) {
> +             pr_err("failed to lock kiq\n");
> +             goto kiq_write_exit;
> +     }
> +
> +     amdgpu_gfx_kiq_consume(kiq, NULL);
>       amdgpu_ring_alloc(ring, 32);
>       amdgpu_ring_emit_wreg(ring, reg, v);
>       amdgpu_fence_emit_polling(ring, &seq);
>       amdgpu_ring_commit(ring);
> -     spin_unlock_irqrestore(&kiq->ring_lock, flags);
> +     amdgpu_gfx_kiq_unlock(kiq, &flags);
>   
>       r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>   
> @@ -754,8 +806,11 @@ void amdgpu_kiq_wreg(struct amdgpu_device *adev, 
> uint32_t reg, uint32_t v)
>       if (cnt > MAX_KIQ_REG_TRY)
>               goto failed_kiq_write;
>   
> +     amdgpu_gfx_kiq_restore(kiq, NULL);
>       return;
>   
>   failed_kiq_write:
> +     amdgpu_gfx_kiq_restore(kiq, NULL);
> +kiq_write_exit:
>       pr_err("failed to write reg:%x\n", reg);
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 634746829024..ff7597ca6cad 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -96,6 +96,7 @@ struct kiq_pm4_funcs {
>       int invalidate_tlbs_size;
>   };
>   
> +#define MAX_KIQ_RREG_NUM 64
>   struct amdgpu_kiq {
>       u64                     eop_gpu_addr;
>       struct amdgpu_bo        *eop_obj;
> @@ -103,7 +104,7 @@ struct amdgpu_kiq {
>       struct amdgpu_ring      ring;
>       struct amdgpu_irq_src   irq;
>       const struct kiq_pm4_funcs *pmf;
> -     uint32_t                        reg_val_offs;
> +     atomic64_t              submit_avail;
>   };
>   
>   /*
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index f61664ee4940..137d3d2b46e8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -181,7 +181,8 @@ struct amdgpu_ring_funcs {
>       void (*end_use)(struct amdgpu_ring *ring);
>       void (*emit_switch_buffer) (struct amdgpu_ring *ring);
>       void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
> -     void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
> +     void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
> +                       uint32_t reg_val_offs);
>       void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
>       void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg,
>                             uint32_t val, uint32_t mask); @@ -265,7 +266,7 @@ 
> struct 
> amdgpu_ring {
>   #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r))
>   #define amdgpu_ring_emit_switch_buffer(r) 
> (r)->funcs->emit_switch_buffer((r))
>   #define amdgpu_ring_emit_cntxcntl(r, d) 
> (r)->funcs->emit_cntxcntl((r), (d)) -#define amdgpu_ring_emit_rreg(r, 
> d) (r)->funcs->emit_rreg((r), (d))
> +#define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r), 
> +(d), (o))
>   #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
>   #define amdgpu_ring_emit_reg_wait(r, d, v, m) 
> (r)->funcs->emit_reg_wait((r), (d), (v), (m))
>   #define amdgpu_ring_emit_reg_write_reg_wait(r, d0, d1, v, m) 
> (r)->funcs->emit_reg_write_reg_wait((r), (d0), (d1), (v), (m)) diff 
> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index 8c10084f44ef..a4c3f284691c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -56,13 +56,19 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct 
> amdgpu_device *adev,
>       unsigned long flags;
>       uint32_t seq;
>   
> -     spin_lock_irqsave(&kiq->ring_lock, flags);
> +     r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> +     if (r) {
> +             pr_err("failed to lock kiq\n");
> +             goto failed_exit;
> +     }
> +
> +     amdgpu_gfx_kiq_consume(kiq, NULL);
>       amdgpu_ring_alloc(ring, 32);
>       amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
>                                           ref, mask);
>       amdgpu_fence_emit_polling(ring, &seq);
>       amdgpu_ring_commit(ring);
> -     spin_unlock_irqrestore(&kiq->ring_lock, flags);
> +     amdgpu_gfx_kiq_unlock(kiq, &flags);
>   
>       r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>   
> @@ -80,9 +86,12 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct 
> amdgpu_device *adev,
>       if (cnt > MAX_KIQ_REG_TRY)
>               goto failed_kiq;
>   
> +     amdgpu_gfx_kiq_restore(kiq, NULL);
>       return;
>   
>   failed_kiq:
> +     amdgpu_gfx_kiq_restore(kiq, NULL);
> +failed_exit:
>       pr_err("failed to write reg %x wait reg %x\n", reg0, reg1);
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 0a03e2ad5d95..7853dbc3c8bd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -7594,10 +7594,10 @@ static void gfx_v10_0_ring_emit_frame_cntl(struct 
> amdgpu_ring *ring, bool start,
>       amdgpu_ring_write(ring, v | FRAME_CMD(start ? 0 : 1));
>   }
>   
> -static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, 
> uint32_t reg)
> +static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
> +                                  uint32_t reg_val_offs)
>   {
>       struct amdgpu_device *adev = ring->adev;
> -     struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   
>       amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>       amdgpu_ring_write(ring, 0 |     /* src: register*/
> @@ -7606,9 +7606,9 @@ static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring 
> *ring, uint32_t reg)
>       amdgpu_ring_write(ring, reg);
>       amdgpu_ring_write(ring, 0);
>       amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
> -                             kiq->reg_val_offs * 4));
> +                             reg_val_offs * 4));
>       amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
> -                             kiq->reg_val_offs * 4));
> +                             reg_val_offs * 4));
>   }
>   
>   static void gfx_v10_0_ring_emit_wreg(struct amdgpu_ring *ring, 
> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index fc6c2f2bc76c..bdbd92d4fe45 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -6383,10 +6383,10 @@ static void gfx_v8_0_ring_emit_patch_cond_exec(struct 
> amdgpu_ring *ring, unsigne
>               ring->ring[offset] = (ring->ring_size >> 2) - offset + cur;
>   }
>   
> -static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, 
> uint32_t reg)
> +static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
> +                                 uint32_t reg_val_offs)
>   {
>       struct amdgpu_device *adev = ring->adev;
> -     struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   
>       amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>       amdgpu_ring_write(ring, 0 |     /* src: register*/
> @@ -6395,9 +6395,9 @@ static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring 
> *ring, uint32_t reg)
>       amdgpu_ring_write(ring, reg);
>       amdgpu_ring_write(ring, 0);
>       amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
> -                             kiq->reg_val_offs * 4));
> +                             reg_val_offs * 4));
>       amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
> -                             kiq->reg_val_offs * 4));
> +                             reg_val_offs * 4));
>   }
>   
>   static void gfx_v8_0_ring_emit_wreg(struct amdgpu_ring *ring, 
> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 84fcf842316d..293219452c0f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -4042,14 +4042,21 @@ static int gfx_v9_0_soft_reset(void *handle)
>   static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)
>   {
>       signed long r, cnt = 0;
> -     unsigned long flags;
> -     uint32_t seq;
> +     unsigned long flags = 0;
> +     uint32_t seq, reg_val_offs;
> +     uint64_t value = 0;
>       struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>       struct amdgpu_ring *ring = &kiq->ring;
>   
>       BUG_ON(!ring->funcs->emit_rreg);
>   
> -     spin_lock_irqsave(&kiq->ring_lock, flags);
> +     r = amdgpu_gfx_kiq_lock(kiq, true, &flags);
> +     if (r) {
> +             pr_err("failed to lock kiq\n");
> +             goto failed_kiq_exit;
> +     }
> +
> +     amdgpu_gfx_kiq_consume(kiq, &reg_val_offs);
>       amdgpu_ring_alloc(ring, 32);
>       amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>       amdgpu_ring_write(ring, 9 |     /* src: register*/
> @@ -4059,12 +4066,12 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct 
> amdgpu_device *adev)
>       amdgpu_ring_write(ring, 0);
>       amdgpu_ring_write(ring, 0);
>       amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
> -                             kiq->reg_val_offs * 4));
> +                             reg_val_offs * 4));
>       amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
> -                             kiq->reg_val_offs * 4));
> +                             reg_val_offs * 4));
>       amdgpu_fence_emit_polling(ring, &seq);
>       amdgpu_ring_commit(ring);
> -     spin_unlock_irqrestore(&kiq->ring_lock, flags);
> +     amdgpu_gfx_kiq_unlock(kiq, &flags);
>   
>       r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>   
> @@ -4088,10 +4095,14 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct 
> amdgpu_device *adev)
>       if (cnt > MAX_KIQ_REG_TRY)
>               goto failed_kiq_read;
>   
> -     return (uint64_t)adev->wb.wb[kiq->reg_val_offs] |
> -             (uint64_t)adev->wb.wb[kiq->reg_val_offs + 1 ] << 32ULL;
> +     value = (uint64_t)adev->wb.wb[reg_val_offs] |
> +             (uint64_t)adev->wb.wb[reg_val_offs + 1 ] << 32ULL;
>   
> +     amdgpu_gfx_kiq_restore(kiq, &reg_val_offs);
> +     return value;
>   failed_kiq_read:
> +     amdgpu_gfx_kiq_restore(kiq, &reg_val_offs);
> +failed_kiq_exit:
>       pr_err("failed to read gpu clock\n");
>       return ~0;
>   }
> @@ -5482,10 +5493,10 @@ static void gfx_v9_0_ring_emit_patch_cond_exec(struct 
> amdgpu_ring *ring, unsigne
>               ring->ring[offset] = (ring->ring_size>>2) - offset + cur;
>   }
>   
> -static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, 
> uint32_t reg)
> +static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
> +                                 uint32_t reg_val_offs)
>   {
>       struct amdgpu_device *adev = ring->adev;
> -     struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   
>       amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>       amdgpu_ring_write(ring, 0 |     /* src: register*/
> @@ -5494,9 +5505,9 @@ static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring 
> *ring, uint32_t reg)
>       amdgpu_ring_write(ring, reg);
>       amdgpu_ring_write(ring, 0);
>       amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
> -                             kiq->reg_val_offs * 4));
> +                             reg_val_offs * 4));
>       amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
> -                             kiq->reg_val_offs * 4));
> +                             reg_val_offs * 4));
>   }
>   
>   static void gfx_v9_0_ring_emit_wreg(struct amdgpu_ring *ring, 
> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index 30b75d79efdb..ab960b052c0d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -415,6 +415,7 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct 
> amdgpu_device *adev,
>   {
>       int vmid, i;
>       signed long r;
> +     unsigned long flags = 0;
>       uint32_t seq;
>       uint16_t queried_pasid;
>       bool ret;
> @@ -422,20 +423,28 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct 
> amdgpu_device *adev,
>       struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>   
>       if (amdgpu_emu_mode == 0 && ring->sched.ready) {
> -             spin_lock(&adev->gfx.kiq.ring_lock);
> +             r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> +             if (r) {
> +                     pr_err("failed to lock kiq\n");
> +                     return -ETIME;
> +             }
> +
>               /* 2 dwords flush + 8 dwords fence */
>               amdgpu_ring_alloc(ring, kiq->pmf->invalidate_tlbs_size + 8);
> +             amdgpu_gfx_kiq_consume(kiq, NULL);
>               kiq->pmf->kiq_invalidate_tlbs(ring,
>                                       pasid, flush_type, all_hub);
>               amdgpu_fence_emit_polling(ring, &seq);
>               amdgpu_ring_commit(ring);
> -             spin_unlock(&adev->gfx.kiq.ring_lock);
> +             amdgpu_gfx_kiq_unlock(kiq, &flags);
>               r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
>               if (r < 1) {
>                       DRM_ERROR("wait for kiq fence error: %ld.\n", r);
> +                     amdgpu_gfx_kiq_restore(kiq, false);
>                       return -ETIME;
>               }
>   
> +             amdgpu_gfx_kiq_restore(kiq, NULL);
>               return 0;
>       }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index fecdbc471983..d14fd56a959e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -589,6 +589,7 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct 
> amdgpu_device *adev,
>   {
>       int vmid, i;
>       signed long r;
> +     unsigned long flags = 0;
>       uint32_t seq;
>       uint16_t queried_pasid;
>       bool ret;
> @@ -613,7 +614,13 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct 
> amdgpu_device *adev,
>               if (vega20_xgmi_wa)
>                       ndw += kiq->pmf->invalidate_tlbs_size;
>   
> -             spin_lock(&adev->gfx.kiq.ring_lock);
> +             r = amdgpu_gfx_kiq_lock(kiq, false, &flags);
> +             if (r) {
> +                     pr_err("failed to lock kiq\n");
> +                     return -ETIME;
> +             }
> +
> +             amdgpu_gfx_kiq_consume(kiq, NULL);
>               /* 2 dwords flush + 8 dwords fence */
>               amdgpu_ring_alloc(ring, ndw);
>               if (vega20_xgmi_wa)
> @@ -623,13 +630,15 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct 
> amdgpu_device *adev,
>                                       pasid, flush_type, all_hub);
>               amdgpu_fence_emit_polling(ring, &seq);
>               amdgpu_ring_commit(ring);
> -             spin_unlock(&adev->gfx.kiq.ring_lock);
> +             amdgpu_gfx_kiq_unlock(kiq, &flags);
>               r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
>               if (r < 1) {
>                       DRM_ERROR("wait for kiq fence error: %ld.\n", r);
> +                     amdgpu_gfx_kiq_restore(kiq, NULL);
>                       return -ETIME;
>               }
>   
> +             amdgpu_gfx_kiq_restore(kiq, NULL);
>               return 0;
>       }
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to