Hi Prike,

no, that can lead to massive problems in a real OOM situation and is not something we can do here.

Christian.

Am 15.05.19 um 04:00 schrieb Liang, Prike:

Hi Christian ,

I just wonder when encounter ENOMEM error during pin amdgpu BOs can we retry validate again as below.

With the following simply patch the Abaqus pinned issue not observed.

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index 11cbf63..72a32f5 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

@@ -902,11 +902,15 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,

bo->placements[i].lpfn = lpfn;

                bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT;

        }

-

+retry:

        r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);

        if (unlikely(r)) {

-               dev_err(adev->dev, "%p pin failed\n", bo);

-               goto error;

+                if (r == -ENOMEM){

+                        goto retry;

+                } else {

+ dev_err(adev->dev, "%p pin failed\n", bo);

+                       goto error;

+                }

        }

        bo->pin_count = 1;

Thanks,

Prike

*From:* Marek Olšák <mar...@gmail.com>
*Sent:* Wednesday, May 15, 2019 3:33 AM
*To:* Christian König <ckoenig.leichtzumer...@gmail.com>
*Cc:* Zhou, David(ChunMing) <david1.z...@amd.com>; Liang, Prike <prike.li...@amd.com>; dri-devel <dri-de...@lists.freedesktop.org>; amd-gfx mailing list <amd-gfx@lists.freedesktop.org> *Subject:* Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

[CAUTION: External Email]

This series fixes the OOM errors. However, if I torture the kernel driver more, I can get it to deadlock and end up with unkillable processes. I can also get an OOM error. I just ran the test 5 times:

AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears

Marek

On Tue, May 14, 2019 at 8:31 AM Christian König <ckoenig.leichtzumer...@gmail.com <mailto:ckoenig.leichtzumer...@gmail.com>> wrote:

    This avoids OOM situations when we have lots of threads
    submitting at the same time.

    Signed-off-by: Christian König <christian.koe...@amd.com
    <mailto:christian.koe...@amd.com>>
    ---
     drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)

    diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
    b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
    index fff558cf385b..f9240a94217b 100644
    --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
    +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
    @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct
    amdgpu_cs_parser *p,
            }

            r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
    -                                  &duplicates, true);
    +                                  &duplicates, false);
            if (unlikely(r != 0)) {
                    if (r != -ERESTARTSYS)
    DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
-- 2.17.1

    _______________________________________________
    amd-gfx mailing list
    amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
    https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to