amdgpu: Optimization of critical section

axie Tue, 13 Jun 2017 10:19:12 -0700

On 2017-06-13 06:23 AM, Christian König wrote:

Am 13.06.2017 um 08:29 schrieb axie:
On 2017-06-12 07:00 PM, Christian König wrote:
Am 12.06.2017 um 22:31 schrieb Alex Xie:
This patch is to move a loop of unref BOs and
several memory free function calls out of
critical sections.
Signed-off-by: Alex Xie <alexbin....@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.cb/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index a664987..02c138f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -75,9 +75,12 @@ static void amdgpu_bo_list_destroy(structamdgpu_fpriv *fpriv, int id)
          /* Another user may have a reference to this list still */
          mutex_lock(&list->lock);
          mutex_unlock(&list->lock);
+        mutex_unlock(&fpriv->bo_list_lock);
          amdgpu_bo_list_free(list);
      }
-    mutex_unlock(&fpriv->bo_list_lock);
+    else {
+        mutex_unlock(&fpriv->bo_list_lock);
+    }
You could move the unlock of bo_list_lock even before the if.

But since you pointed it out there is quite a bug in this function:
/* Another user may have a reference to this liststill */
                mutex_lock(&list->lock);
                mutex_unlock(&list->lock);
Not sure if that is still up to date, but that use case used to beillegal with mutexes.
As I understand this piece of code, these mutex_lock and mutex_unlockpair are used to make sure all other tasks have finishedaccess of the bo list. Another side of this story is in functionamdgpu_bo_list_get. These two piece of codes together make sure
we can safely destroy bo list.
Yeah, the idea behind the code is correct. But using mutexes in thatway is illegal, see here https://lwn.net/Articles/575460/.
I'm not sure if that is still up to date, but in ancient times youneeded to avoid patterns like this:
mutex_unlock(&obj->lock);
kfree(obj);
Anyway I suggest we just replace the whole bo_list handling with andRCU and refcount based implementation. That should avoid the wholelocking for the read only code path.
Regards,
Christian.

The article may be still generally applicable. But it is not applicablefor amdgpu_bo_list.c.

amdgpu_bo_list.c has two extra steps to avoid similar issue.

1. In function amdgpu_bo_list_destroy,
...

down_write(&fpriv->bo_list_lock); //Note that the bo_list_lock ismutex or write lock here.

    list = idr_remove(&fpriv->bo_list_handles, id);
...

With the above statements, there is no new task(user) can access the bolist.

2. With the above statments, there is no task currently waiting for thebo list lock after function amdgpu_bo_list_destroy holds bo_list_lock.Why? In function amdgpu_bo_list_get, other task need to holdbo_list_lock mutex or read lock in order to wait for result->lock.

amdgpu_bo_list_get(struct amdgpu_fpriv *fpriv, int id)
{
  ...
    down_read(&fpriv->bo_list_lock);
...
        mutex_lock(&result->lock);
...
    up_read(&fpriv->bo_list_lock);
}

So when amdgpu_bo_list_destroy function is holding bo_list_lock. Allother tasks are not waiting for result->lock.

3.

After the above 2 points, we know that when destroy function is holdingbo_list_lock, there can only be zero/one task is holding

list->lock. There is no other scenario.

The following piece of code wait for possible existing bo listtask(user) finishes accessing the bo list:

static void amdgpu_bo_list_destroy(struct amdgpu_fpriv *fpriv, int id)
{
...
    down_write(&fpriv->bo_list_lock);
...
        /* Another user may have a reference to this list still */

mutex_lock(&list->lock); //// WAITing for the exisingtask/user unlock, after this waiting, we can safely destroy and free themutex. [Alex Bin]

        mutex_unlock(&list->lock);

        up_write(&fpriv->bo_list_lock);
        amdgpu_bo_list_free(list);
...

NOTE: When we are holding bo_list_lock, there is only one possiblescenario for list->lock left, that is,

in function amdgpu_bo_list_put, to unlock list->lock.


PLEASE give me example to prove if I miss anything.

Thanks for all other comments.

RCU lock and kref might be a cleaner solution than rw_semaphore. Butthat requires bigger change to current implementation.

I would have a look at that later.

Regards,
Alex Bin Xie


Otherwise we can easily simplify these lockings.
Let me give an example here.

In function amdgpu_bo_list_get, assuming we change the code like this:
...
    down_read(&fpriv->bo_list_lock);
    result = idr_find(&fpriv->bo_list_handles, id);
    up_read(&fpriv->bo_list_lock);
    /**Line 1. Task A was scheduled away from CPU**/
    if (result)
        mutex_lock(&result->lock);
...

In function amdgpu_bo_list_destroy, assuming we change the code likethis:

...
    down_write(&fpriv->bo_list_lock);
    list = idr_remove(&fpriv->bo_list_handles, id);
    up_write(&fpriv->bo_list_lock);
    if (list) {
        /* Another user may have a reference to this list still */
        mutex_lock(&list->lock);
        mutex_unlock(&list->lock);
        amdgpu_bo_list_free(list);
    }
}

When task A is running in function amdgpu_bo_list_get in line 1, CPUscheduler takes CPU away from task A.Then Task B run function amdgpu_bo_list_destroy. Task B can run allthe way to destroy and free mutex.Later Task A is back to run. The mutex result->lock has beendestroyed by task B. Now task A try to lock a mutex

which has been destroyed and freed.


Christian.

  }
    static int amdgpu_bo_list_set(struct amdgpu_device *adev,


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/4] drm/amdgpu: Optimization of critical section

Reply via email to