Re: [PATCH] drm/amdgpu: refine kiq access register

Christian König Wed, 22 Apr 2020 05:34:26 -0700

Am 22.04.20 um 14:20 schrieb Tao, Yintian:

Hi  Christian



Please see inline commetns.
-----Original Message-----
From: Koenig, Christian <christian.koe...@amd.com>
Sent: 2020年4月22日 19:57
To: Tao, Yintian <yintian....@amd.com>; Liu, Monk <monk....@amd.com>; Kuehling, Felix 
<felix.kuehl...@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: refine kiq access register

Am 22.04.20 um 13:49 schrieb Tao, Yintian:

Hi  Christian


Can you help answer the questions below? Thanks in advance.
-----Original Message-----
From: Koenig, Christian <christian.koe...@amd.com>
Sent: 2020年4月22日 19:03
To: Tao, Yintian <yintian....@amd.com>; Liu, Monk <monk....@amd.com>;
Kuehling, Felix <felix.kuehl...@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: refine kiq access register

Am 22.04.20 um 11:29 schrieb Yintian Tao:

According to the current kiq access register method, there will be
race condition when using KIQ to read register if multiple clients
want to read at same time just like the expample below:
1. client-A start to read REG-0 throguh KIQ 2. client-A poll the
seqno-0 3. client-B start to read REG-1 through KIQ 4. client-B poll
the seqno-1 5. the kiq complete these two read operation 6. client-A
to read the register at the wb buffer and
      get REG-1 value

And if there are multiple clients to frequently write registers
through KIQ which may raise the KIQ ring buffer overwritten problem.

Therefore, allocate fixed number wb slot for rreg use and limit the
submit number which depends on the kiq ring_size in order to prevent
the overwritten problem.

v2: directly use amdgpu_device_wb_get() for each read instead
       of to reserve fixde number slot.
       if there is no enough kiq ring buffer or rreg slot then
       directly print error log and return instead of busy waiting

I would split that into three patches. One for each problem we have here:

1. Fix kgd_hiq_mqd_load() and maybe other occasions to use spin_lock_irqsave().
[yttao]: Do you mean that we need to use spin_lock_irqsave for the functions 
just like kgd_hiq_mqd_load()?

Yes, I strongly think so.

See when you have one spin lock you either need always need to lock it with 
irqs disabled or never.

In other words we always need to either use spin_lock() or spin_lock_irqsave(), 
but never mix them with the same lock.

The only exception to this rule is when you take multiple locks, e.g.
you can do:

spin_lock_irqsave(&a, flags);
spin_lock(&b, flags);
spin_lock(&c, flags);
....
spin_unlock_irqsave(&a, flags);

Here you don't need to use spin_lock_irqsave for b and c. But we rarely have 
that case in the code.
[yttao]: thanks , I got it. I will submit another patch for it.

2. Prevent the overrung of the KIQ. Please drop the approach with the
atomic here. Instead just add a amdgpu_fence_wait_polling() into
amdgpu_fence_emit_polling() as I discussed with Monk.
[yttao]: Sorry, I can't get your original idea for the 
amdgpu_fence_wait_polling(). Can you give more details about it? Thanks in 
advance.

"That is actually only a problem because the KIQ uses polling waits.

See amdgpu_fence_emit() waits for the oldest possible fence to be signaled 
before emitting a new one.

I suggest that we do the same in amdgpu_fence_emit_polling(). A one liner like 
the following should be enough:

amdgpu_fence_wait_polling(ring, seq - ring->fence_drv.num_fences_mask, 
timeout);"
[yttao]: there is no usage of num_fences_mask at kiq fence polling, the 
num_fences_mask is only effective at dma_fence architecture.
                If I understand correctly, do you want the protype code below? 
If the protype code is wrong, can you help give one sample? Thanks in advance.

int amdgpu_fence_emit_polling(struct amdgpu_ring *ring, uint32_t *s) {
          uint32_t seq;

          if (!s)
                  return -EINVAL;
+               amdgpu_fence_wait_polling(ring, seq, timeout);
          seq = ++ring->fence_drv.sync_seq;

Your understanding sounds more or less correct. The code should look something 
like this:

seq = ++ring->fence_drv.sync_seq;
amdgpu_fence_wait_polling(ring, seq -
number_of_allowed_submissions_to_the_kiq, timeout);
[yttao]: whether we need directly wait at the first just like below? Otherwise, 
amdgpu_ring_emit_wreg may overwrite the KIQ ring buffer.


There should always be room for at least one more submission.

As long as we always submit a fence checking the free room there shouldbe fine.


Regards,
Christian.

+               amdgpu_fence_wait_polling(ring, seq - 
number_of_allowed_submissions_to_the_kiq, timeout);
                spin_lock_irqsave(&kiq->ring_lock, flags);
         amdgpu_ring_alloc(ring, 32);
         amdgpu_ring_emit_wreg(ring, reg, v);
         amdgpu_fence_emit_polling(ring, &seq); /* wait */
         amdgpu_ring_commit(ring);
         spin_unlock_irqrestore(&kiq->ring_lock, flags);

I just used num_fences_mask as number_of_allowed_submissions_to_the_kiq
because it is probably a good value to start with.

But you could give that as parameter as well if you think that makes more sense.

          amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
                          ¦      seq, 0);

          *s = seq;

          return 0;
}




3. Use amdgpu_device_wb_get() each time we need to submit a read.
[yttao]: yes, I will do it.

Thanks,
Christian.

Regards,
Christian.


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: refine kiq access register

Reply via email to