Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-30 Thread Sharma, Deepak
Did find way to reproduce issue constantly. After applying David's patch 
"0001-drm-amdgpu-fix-signaled-fence-isn-t-handled" with minor change


-static struct dma_fence *drm_syncobj_get_stub_fence(void)
+struct dma_fence *drm_syncobj_get_stub_fence(void)


was able to avoid kernel panic due to NULL pointer dereference. So seems it is 
resolving the problem which I was trying to address.

Can you please put revision of this patch?



Though only one time I got reboot with dmesg..


[  213.714324] RIP: 0010:reservation_object_add_shared_fence+0x22e/0x245
[  213.714331] RSP: 0018:a03900b5ba80 EFLAGS: 00010246
[  213.714338] RAX: 0004 RBX: a03900b5bba8 RCX: 969421139d00
[  213.714343] RDX:  RSI: 9693639662e0 RDI: 9694203db230
[  213.714349] RBP: a03900b5bad0 R08: 002a R09: 969363966280
[  213.714354] R10: 000180800079 R11: a3637f66 R12: 96941ea5cf00
[  213.714360] R13:  R14: 9693639662e0 R15: 9694203db230
[  213.714366] FS:  786009e5f700() GS:96942ec0() 
knlGS:
[  213.714372] CS:  0010 DS:  ES:  CR0: 80050033
[  213.714377] CR2: 7860023a4000 CR3: 00011ea8a000 CR4: 001406f0
[  213.714382] Call Trace:
[  213.714397]  ttm_eu_fence_buffer_objects+0x53/0x89
[  213.714407]  amdgpu_cs_ioctl+0x194f/0x196c
[  213.714420]  ? amdgpu_cs_report_moved_bytes+0x5f/0x5f
[  213.714427]  drm_ioctl_kernel+0x98/0xac
[  213.714435]  drm_ioctl+0x235/0x357
[  213.714443]  ? amdgpu_cs_report_moved_bytes+0x5f/0x5f
[  213.714450]  ? __switch_to_asm+0x34/0x70
[  213.714456]  ? __switch_to_asm+0x40/0x70
[  213.714462]  ? __switch_to_asm+0x34/0x70
[  213.714468]  ? __switch_to_asm+0x40/0x70
[  213.714476]  amdgpu_drm_ioctl+0x46/0x78
[  213.714500]  vfs_ioctl+0x1b/0x30
[  213.714507]  do_vfs_ioctl+0x492/0x6c1
[  213.714530]  SyS_ioctl+0x52/0x77
[  213.714537]  do_syscall_64+0x6d/0x81
[  213.714544]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  213.714551] RIP: 0033:0x78600c9c6967
[  213.714556] RSP: 002b:786009e5ec08 EFLAGS: 0246 ORIG_RAX: 
0010
[  213.714563] RAX: ffda RBX: c0186444 RCX: 78600c9c6967
[  213.714568] RDX: 786009e5ec60 RSI: c0186444 RDI: 000d
[  213.714573] RBP: 786009e5ec40 R08: 786009e5ed00 R09: 786009e5ec50
[  213.714578] R10:  R11: 0246 R12: 000d
[  213.714583] R13: 3ba2573ff128 R14:  R15: 786009e5ec60



Its worth to mention that I am not using latest kernel .relevant git log


71a0556255de125b7e3fc0dc6171fb31fab2b9ad drm/syncobj: Don't leak fences when 
WAIT_FOR_SUBMIT is set

4034318d2afac8f7f43457a4b480b877a94ec651  drm/syncobj: Stop reusing the same 
struct file for all syncobj -> fd


So have not taken any recent changes  from 
https://patchwork.kernel.org/patch/10575275/ series. only backported  attached 
two patches . I need to check if something is missing that can cause issue.


Thanks,

Deepak



From: Sharma, Deepak
Sent: Tuesday, November 27, 2018 3:12 PM
To: Zhou, David(ChunMing); Koenig, Christian; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer 
dereference


Thanks David ,

I did backport the drm patch to my kernel  after that I am not getting -ENOMEM 
from amdgpu_cs_ioctl while running tests  so have not been able to test patch 
to handle signaled fence. As this issue is hard to reproduce , will give some 
more try.

But yes the problem is there and need to handle when fence is null that your 
patch seems to handle it correctly.

-Deepak

From: Zhou, David(ChunMing)
Sent: Monday, November 26, 2018 6:40 PM
To: Sharma, Deepak; Zhou, David(ChunMing); Koenig, Christian; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer 
dereference

Yeah, you need another drm patch as well when you apply my patch. Attached.

-David


On 2018年11月27日 08:40, Sharma, Deepak wrote:
>
> On 11/26/18 1:57 AM, Zhou, David(ChunMing) wrote:
>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Monday, November 26, 2018 5:23 PM
>>> To: Sharma, Deepak ; Zhou, David(ChunMing)
>>> ; Koenig, Christian ;
>>> amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer
>>> dereference
>>>
>>> Am 26.11.18 um 02:59 schrieb Sharma, Deepak:
>>>> 在 2018/11/24 2:10, Koenig, Christian 写道:
>>>>> Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):
>>>>>> 在 2018/11/23 21:30, Koenig, Christian 写道:
>>>>>>> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
>>>>>>>> 在 2018/11/22 19:25, Christian König 写道:

RE: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-27 Thread Sharma, Deepak


-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Tuesday, November 27, 2018 1:52 AM
To: Sharma, Deepak ; Zhou, David(ChunMing) 
; Koenig, Christian ; Sharma, 
Deepak ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer 
dereference

Am 27.11.18 um 01:40 schrieb Deepak Sharma:
>
> On 11/26/18 1:57 AM, Zhou, David(ChunMing) wrote:
>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Monday, November 26, 2018 5:23 PM
>>> To: Sharma, Deepak ; Zhou, David(ChunMing)
>>> ; Koenig, Christian ;
>>> amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer
>>> dereference
>>>
>>> Am 26.11.18 um 02:59 schrieb Sharma, Deepak:
>>>> 在 2018/11/24 2:10, Koenig, Christian 写道:
>>>>> Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):
>>>>>> 在 2018/11/23 21:30, Koenig, Christian 写道:
>>>>>>> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
>>>>>>>> 在 2018/11/22 19:25, Christian König 写道:
>>>>>>>>> Am 22.11.18 um 07:56 schrieb Sharma, Deepak:
>>>>>>>>>> when returned fence is not valid mostly due to userspace ignored
>>>>>>>>>> previous error causes NULL pointer dereference.
>>>>>>>>> Again, this is clearly incorrect. The my other mails on the
>>>>>>>>> earlier patch.
>>>>>>>> Sorry for I didn't get your history, but looks from the patch
>>>>>>>> itself, it is still a valid patch, isn't it?
>>>>>>> No, the semantic of amdgpu_ctx_get_fence() is that we return NULL
>>>>>>> when the fence is already signaled.
>>>>>>>
>>>>>>> So this patch could totally break userspace because it changes the
>>>>>>> behavior when we try to sync to an already signaled fence.
>>>>>> Ah, I got your meaning, how about attached patch?
>>>>> Yeah something like this, but I would just give the
>>>>> DRM_SYNCOBJ_CREATE_SIGNALED instead.
>>>>>
>>>>> I mean that's what this flag is good for isn't it?
>>>> Yeah, I give a flag initally when creating patch, but as you know, there 
>>>> is a
>>> swich case not be able to use that flag:
>>>>     case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
>>>>     fd = get_unused_fd_flags(O_CLOEXEC);
>>>>     if (fd < 0) {
>>>>     dma_fence_put(fence);
>>>>     return fd;
>>>>     }
>>>>
>>>>     sync_file = sync_file_create(fence);
>>>>     dma_fence_put(fence);
>>>>     if (!sync_file) {
>>>>     put_unused_fd(fd);
>>>>     return -ENOMEM;
>>>>     }
>>>>
>>>>     fd_install(fd, sync_file->file);
>>>>     info->out.handle = fd;
>>>>     return 0;
>>>>
>>>> So I change to stub fence instead.
>>> Yeah, I've missed that case. Not sure if the sync_file can deal with a NULL
>>> fence.
>>>
>>> We should then probably move the stub fence function into
>>> dma_fence_stub.c under drivers/dma-buf to keep the stuff together.
>> Yes, you wrap it to review first with your stub fence, we can do it 
>> separately first.
>>
>> -David
>>>> -David
>>>>
>>>> I have not applied this patch.
>>>> The issue was trying to address is when amdgpu_cs_ioctl() failed due to
>>> low memory (ENOMEM) but userspace chose to proceed and called
>>> amdgpu_cs_fence_to_handle_ioctl().
>>>> In amdgpu_cs_fence_to_handle_ioctl(), fence is null and later causing
>>>> NULL pointer dereference, this patch was to avoid that and system panic
>>> But I understand now that its a valid case retuning NULL if fence was 
>>> already
>>> signaled but need to handle case so avoid kernel panic. Seems David patch
>>> should fix this, I will test it tomorrow.
>>>
>>> Mhm, but don't we bail out with an error if we ask for a failed command
>>> submission? If not that sounds like a bug as well.
>>>
>>> Christian.
>>>
> Where do we do that?
> I see error
> [drm:amdgpu_cs_ioctl] *ERROR* amdgpu_cs_list_validate(validated) fai

Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-27 Thread Sharma, Deepak
Thanks David ,

I did backport the drm patch to my kernel  after that I am not getting -ENOMEM 
from amdgpu_cs_ioctl while running tests  so have not been able to test patch 
to handle signaled fence. As this issue is hard to reproduce , will give some 
more try.

But yes the problem is there and need to handle when fence is null that your 
patch seems to handle it correctly.

-Deepak

From: Zhou, David(ChunMing)
Sent: Monday, November 26, 2018 6:40 PM
To: Sharma, Deepak; Zhou, David(ChunMing); Koenig, Christian; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer 
dereference

Yeah, you need another drm patch as well when you apply my patch. Attached.

-David


On 2018年11月27日 08:40, Sharma, Deepak wrote:
>
> On 11/26/18 1:57 AM, Zhou, David(ChunMing) wrote:
>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Monday, November 26, 2018 5:23 PM
>>> To: Sharma, Deepak ; Zhou, David(ChunMing)
>>> ; Koenig, Christian ;
>>> amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer
>>> dereference
>>>
>>> Am 26.11.18 um 02:59 schrieb Sharma, Deepak:
>>>> 在 2018/11/24 2:10, Koenig, Christian 写道:
>>>>> Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):
>>>>>> 在 2018/11/23 21:30, Koenig, Christian 写道:
>>>>>>> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
>>>>>>>> 在 2018/11/22 19:25, Christian König 写道:
>>>>>>>>> Am 22.11.18 um 07:56 schrieb Sharma, Deepak:
>>>>>>>>>> when returned fence is not valid mostly due to userspace ignored
>>>>>>>>>> previous error causes NULL pointer dereference.
>>>>>>>>> Again, this is clearly incorrect. The my other mails on the
>>>>>>>>> earlier patch.
>>>>>>>> Sorry for I didn't get your history, but looks from the patch
>>>>>>>> itself, it is still a valid patch, isn't it?
>>>>>>> No, the semantic of amdgpu_ctx_get_fence() is that we return NULL
>>>>>>> when the fence is already signaled.
>>>>>>>
>>>>>>> So this patch could totally break userspace because it changes the
>>>>>>> behavior when we try to sync to an already signaled fence.
>>>>>> Ah, I got your meaning, how about attached patch?
>>>>> Yeah something like this, but I would just give the
>>>>> DRM_SYNCOBJ_CREATE_SIGNALED instead.
>>>>>
>>>>> I mean that's what this flag is good for isn't it?
>>>> Yeah, I give a flag initally when creating patch, but as you know, there 
>>>> is a
>>> swich case not be able to use that flag:
>>>> case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
>>>> fd = get_unused_fd_flags(O_CLOEXEC);
>>>> if (fd < 0) {
>>>> dma_fence_put(fence);
>>>> return fd;
>>>> }
>>>>
>>>> sync_file = sync_file_create(fence);
>>>> dma_fence_put(fence);
>>>> if (!sync_file) {
>>>> put_unused_fd(fd);
>>>> return -ENOMEM;
>>>> }
>>>>
>>>> fd_install(fd, sync_file->file);
>>>> info->out.handle = fd;
>>>> return 0;
>>>>
>>>> So I change to stub fence instead.
>>> Yeah, I've missed that case. Not sure if the sync_file can deal with a NULL
>>> fence.
>>>
>>> We should then probably move the stub fence function into
>>> dma_fence_stub.c under drivers/dma-buf to keep the stuff together.
>> Yes, you wrap it to review first with your stub fence, we can do it 
>> separately first.
>>
>> -David
>>>> -David
>>>>
>>>> I have not applied this patch.
>>>> The issue was trying to address is when amdgpu_cs_ioctl() failed due to
>>> low memory (ENOMEM) but userspace chose to proceed and called
>>> amdgpu_cs_fence_to_handle_ioctl().
>>>> In amdgpu_cs_fence_to_handle_ioctl(), fence is null and later causing
>>>> NULL pointer dereference, this patch was to avoid that and system panic
>>> But I understand now that its a valid case retuning NULL if fence was 
>>> already
>>> signaled

Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-27 Thread Christian König

Am 27.11.18 um 01:40 schrieb Deepak Sharma:


On 11/26/18 1:57 AM, Zhou, David(ChunMing) wrote:



-Original Message-
From: Christian König 
Sent: Monday, November 26, 2018 5:23 PM
To: Sharma, Deepak ; Zhou, David(ChunMing)
; Koenig, Christian ;
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer
dereference

Am 26.11.18 um 02:59 schrieb Sharma, Deepak:

在 2018/11/24 2:10, Koenig, Christian 写道:

Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):

在 2018/11/23 21:30, Koenig, Christian 写道:

Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):

在 2018/11/22 19:25, Christian König 写道:

Am 22.11.18 um 07:56 schrieb Sharma, Deepak:

when returned fence is not valid mostly due to userspace ignored
previous error causes NULL pointer dereference.

Again, this is clearly incorrect. The my other mails on the
earlier patch.

Sorry for I didn't get your history, but looks from the patch
itself, it is still a valid patch, isn't it?

No, the semantic of amdgpu_ctx_get_fence() is that we return NULL
when the fence is already signaled.

So this patch could totally break userspace because it changes the
behavior when we try to sync to an already signaled fence.

Ah, I got your meaning, how about attached patch?

Yeah something like this, but I would just give the
DRM_SYNCOBJ_CREATE_SIGNALED instead.

I mean that's what this flag is good for isn't it?

Yeah, I give a flag initally when creating patch, but as you know, there is a

swich case not be able to use that flag:

    case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
    fd = get_unused_fd_flags(O_CLOEXEC);
    if (fd < 0) {
    dma_fence_put(fence);
    return fd;
    }

    sync_file = sync_file_create(fence);
    dma_fence_put(fence);
    if (!sync_file) {
    put_unused_fd(fd);
    return -ENOMEM;
    }

    fd_install(fd, sync_file->file);
    info->out.handle = fd;
    return 0;

So I change to stub fence instead.

Yeah, I've missed that case. Not sure if the sync_file can deal with a NULL
fence.

We should then probably move the stub fence function into
dma_fence_stub.c under drivers/dma-buf to keep the stuff together.

Yes, you wrap it to review first with your stub fence, we can do it separately 
first.

-David

-David

I have not applied this patch.
The issue was trying to address is when amdgpu_cs_ioctl() failed due to

low memory (ENOMEM) but userspace chose to proceed and called
amdgpu_cs_fence_to_handle_ioctl().

In amdgpu_cs_fence_to_handle_ioctl(), fence is null and later causing
NULL pointer dereference, this patch was to avoid that and system panic

But I understand now that its a valid case retuning NULL if fence was already
signaled but need to handle case so avoid kernel panic. Seems David patch
should fix this, I will test it tomorrow.

Mhm, but don't we bail out with an error if we ask for a failed command
submission? If not that sounds like a bug as well.

Christian.


Where do we do that?
I see error
[drm:amdgpu_cs_ioctl] *ERROR* amdgpu_cs_list_validate(validated) failed.
[drm:amdgpu_cs_ioctl] *ERROR* Not enough memory for command submission!
BUG: unable to handle kernel NULL pointer dereference at 0008
Did some more debugging,dma_fence_is_array() is causing NULL pointer
dereference call through sync_file_ioctl.

Also I think changes in David patch cant be applied on
amd-staging-drm-next, which all patches I should take to get it correctly?


Mhm, what you say here actually doesn't make much sense.

When the sequence number is invalid because the submission failed 
amdgpu_ctx_get_fence() returns an error:

    if (seq >= centity->sequence) {
    spin_unlock(>ring_lock);
    return ERR_PTR(-EINVAL);
    }


This error is then handled in amdgpu_cs_fence_to_handle_ioctl():

    fence = amdgpu_cs_get_fence(adev, filp, >in.fence);
    if (IS_ERR(fence))
    return PTR_ERR(fence);


So the error handling for this is actually in place.

The only thing that could go wrong is that userspace sends a sequence 
number which is to small.


The correct solution is to either set the flag as I suggested and make 
sync_file_create() capable of handling a NULL fence.


Or we make the stub fence global like David suggested.

Regards,
Christian.




-Deepak

Christian.


-David

If that patch was applied then please revert it immediately.

Christian.


-David

If you have already pushed the patch then please revert.

Christian.


Signed-off-by: Deepak Sharma 
---
     drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
     1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 024dfbd87f11..14166cd8a12f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1403,6 +14

Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-26 Thread zhoucm1

Yeah, you need another drm patch as well when you apply my patch. Attached.

-David


On 2018年11月27日 08:40, Sharma, Deepak wrote:


On 11/26/18 1:57 AM, Zhou, David(ChunMing) wrote:



-Original Message-
From: Christian König 
Sent: Monday, November 26, 2018 5:23 PM
To: Sharma, Deepak ; Zhou, David(ChunMing)
; Koenig, Christian ;
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer
dereference

Am 26.11.18 um 02:59 schrieb Sharma, Deepak:

在 2018/11/24 2:10, Koenig, Christian 写道:

Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):

在 2018/11/23 21:30, Koenig, Christian 写道:

Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):

在 2018/11/22 19:25, Christian König 写道:

Am 22.11.18 um 07:56 schrieb Sharma, Deepak:

when returned fence is not valid mostly due to userspace ignored
previous error causes NULL pointer dereference.

Again, this is clearly incorrect. The my other mails on the
earlier patch.

Sorry for I didn't get your history, but looks from the patch
itself, it is still a valid patch, isn't it?

No, the semantic of amdgpu_ctx_get_fence() is that we return NULL
when the fence is already signaled.

So this patch could totally break userspace because it changes the
behavior when we try to sync to an already signaled fence.

Ah, I got your meaning, how about attached patch?

Yeah something like this, but I would just give the
DRM_SYNCOBJ_CREATE_SIGNALED instead.

I mean that's what this flag is good for isn't it?

Yeah, I give a flag initally when creating patch, but as you know, there is a

swich case not be able to use that flag:

    case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
    fd = get_unused_fd_flags(O_CLOEXEC);
    if (fd < 0) {
    dma_fence_put(fence);
    return fd;
    }

    sync_file = sync_file_create(fence);
    dma_fence_put(fence);
    if (!sync_file) {
    put_unused_fd(fd);
    return -ENOMEM;
    }

    fd_install(fd, sync_file->file);
    info->out.handle = fd;
    return 0;

So I change to stub fence instead.

Yeah, I've missed that case. Not sure if the sync_file can deal with a NULL
fence.

We should then probably move the stub fence function into
dma_fence_stub.c under drivers/dma-buf to keep the stuff together.

Yes, you wrap it to review first with your stub fence, we can do it separately 
first.

-David

-David

I have not applied this patch.
The issue was trying to address is when amdgpu_cs_ioctl() failed due to

low memory (ENOMEM) but userspace chose to proceed and called
amdgpu_cs_fence_to_handle_ioctl().

In amdgpu_cs_fence_to_handle_ioctl(), fence is null and later causing
NULL pointer dereference, this patch was to avoid that and system panic

But I understand now that its a valid case retuning NULL if fence was already
signaled but need to handle case so avoid kernel panic. Seems David patch
should fix this, I will test it tomorrow.

Mhm, but don't we bail out with an error if we ask for a failed command
submission? If not that sounds like a bug as well.

Christian.


Where do we do that?
I see error
[drm:amdgpu_cs_ioctl] *ERROR* amdgpu_cs_list_validate(validated) failed.
[drm:amdgpu_cs_ioctl] *ERROR* Not enough memory for command submission!
BUG: unable to handle kernel NULL pointer dereference at 0008
Did some more debugging,dma_fence_is_array() is causing NULL pointer
dereference call through sync_file_ioctl.

Also I think changes in David patch cant be applied on
amd-staging-drm-next, which all patches I should take to get it correctly?


-Deepak

Christian.


-David

If that patch was applied then please revert it immediately.

Christian.


-David

If you have already pushed the patch then please revert.

Christian.


Signed-off-by: Deepak Sharma 
---
     drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
     1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 024dfbd87f11..14166cd8a12f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1403,6 +1403,8 @@ static struct dma_fence
*amdgpu_cs_get_fence(struct amdgpu_device *adev,
       fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);
     amdgpu_ctx_put(ctx);
+    if(!fence)
+    return ERR_PTR(-EINVAL);
       return fence;
     }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Received: from SN1PR12MB0509.namprd12.prod.outlook.com (2603:10b6:a02:a8::31)
 by BY1PR12MB0502.namprd12.prod.outlook.com with HTTPS via
 BYAPR03CA0018.NAMPRD03.PROD.OUTLOOK.COM; Thu, 15 Nov 2018 11:12:57 +
Received: from MWHPR12CA0057.namprd12.prod.outlook.com (2603:10b6:300:103::19)
 by SN1PR12MB0509.namprd12.prod.outlook.com (2a01:111:e

Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-26 Thread Deepak Sharma


On 11/26/18 1:57 AM, Zhou, David(ChunMing) wrote:
> 
> 
>> -Original Message-
>> From: Christian König 
>> Sent: Monday, November 26, 2018 5:23 PM
>> To: Sharma, Deepak ; Zhou, David(ChunMing)
>> ; Koenig, Christian ;
>> amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer
>> dereference
>>
>> Am 26.11.18 um 02:59 schrieb Sharma, Deepak:
>>>
>>> 在 2018/11/24 2:10, Koenig, Christian 写道:
>>>> Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):
>>>>> 在 2018/11/23 21:30, Koenig, Christian 写道:
>>>>>> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
>>>>>>> 在 2018/11/22 19:25, Christian König 写道:
>>>>>>>> Am 22.11.18 um 07:56 schrieb Sharma, Deepak:
>>>>>>>>> when returned fence is not valid mostly due to userspace ignored
>>>>>>>>> previous error causes NULL pointer dereference.
>>>>>>>> Again, this is clearly incorrect. The my other mails on the
>>>>>>>> earlier patch.
>>>>>>> Sorry for I didn't get your history, but looks from the patch
>>>>>>> itself, it is still a valid patch, isn't it?
>>>>>> No, the semantic of amdgpu_ctx_get_fence() is that we return NULL
>>>>>> when the fence is already signaled.
>>>>>>
>>>>>> So this patch could totally break userspace because it changes the
>>>>>> behavior when we try to sync to an already signaled fence.
>>>>> Ah, I got your meaning, how about attached patch?
>>>> Yeah something like this, but I would just give the
>>>> DRM_SYNCOBJ_CREATE_SIGNALED instead.
>>>>
>>>> I mean that's what this flag is good for isn't it?
>>> Yeah, I give a flag initally when creating patch, but as you know, there is 
>>> a
>> swich case not be able to use that flag:
>>>
>>>    case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
>>>    fd = get_unused_fd_flags(O_CLOEXEC);
>>>    if (fd < 0) {
>>>    dma_fence_put(fence);
>>>    return fd;
>>>    }
>>>
>>>    sync_file = sync_file_create(fence);
>>>    dma_fence_put(fence);
>>>    if (!sync_file) {
>>>    put_unused_fd(fd);
>>>    return -ENOMEM;
>>>    }
>>>
>>>    fd_install(fd, sync_file->file);
>>>    info->out.handle = fd;
>>>    return 0;
>>>
>>> So I change to stub fence instead.
>>
>> Yeah, I've missed that case. Not sure if the sync_file can deal with a NULL
>> fence.
>>
>> We should then probably move the stub fence function into
>> dma_fence_stub.c under drivers/dma-buf to keep the stuff together.
> 
> Yes, you wrap it to review first with your stub fence, we can do it 
> separately first.
> 
> -David
>>
>>>
>>> -David
>>>
>>> I have not applied this patch.
>>> The issue was trying to address is when amdgpu_cs_ioctl() failed due to
>> low memory (ENOMEM) but userspace chose to proceed and called
>> amdgpu_cs_fence_to_handle_ioctl().
>>> In amdgpu_cs_fence_to_handle_ioctl(), fence is null and later causing
>>> NULL pointer dereference, this patch was to avoid that and system panic
>> But I understand now that its a valid case retuning NULL if fence was already
>> signaled but need to handle case so avoid kernel panic. Seems David patch
>> should fix this, I will test it tomorrow.
>>
>> Mhm, but don't we bail out with an error if we ask for a failed command
>> submission? If not that sounds like a bug as well.
>>
>> Christian.
>>
Where do we do that?
I see error
[drm:amdgpu_cs_ioctl] *ERROR* amdgpu_cs_list_validate(validated) failed.
[drm:amdgpu_cs_ioctl] *ERROR* Not enough memory for command submission!
BUG: unable to handle kernel NULL pointer dereference at 0008
Did some more debugging,dma_fence_is_array() is causing NULL pointer 
dereference call through sync_file_ioctl.

Also I think changes in David patch cant be applied on 
amd-staging-drm-next, which all patches I should take to get it correctly?

>>>
>>> -Deepak
>>>> Christian.
>>>>
>>>>> -David
>>>>>> If that patch was applied then please revert it immediately.
>>>>>>
>>>>>> Christian.
>>>>>&g

RE: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-26 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Christian König 
> Sent: Monday, November 26, 2018 5:23 PM
> To: Sharma, Deepak ; Zhou, David(ChunMing)
> ; Koenig, Christian ;
> amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer
> dereference
> 
> Am 26.11.18 um 02:59 schrieb Sharma, Deepak:
> >
> > 在 2018/11/24 2:10, Koenig, Christian 写道:
> >> Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):
> >>> 在 2018/11/23 21:30, Koenig, Christian 写道:
> >>>> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
> >>>>> 在 2018/11/22 19:25, Christian König 写道:
> >>>>>> Am 22.11.18 um 07:56 schrieb Sharma, Deepak:
> >>>>>>> when returned fence is not valid mostly due to userspace ignored
> >>>>>>> previous error causes NULL pointer dereference.
> >>>>>> Again, this is clearly incorrect. The my other mails on the
> >>>>>> earlier patch.
> >>>>> Sorry for I didn't get your history, but looks from the patch
> >>>>> itself, it is still a valid patch, isn't it?
> >>>> No, the semantic of amdgpu_ctx_get_fence() is that we return NULL
> >>>> when the fence is already signaled.
> >>>>
> >>>> So this patch could totally break userspace because it changes the
> >>>> behavior when we try to sync to an already signaled fence.
> >>> Ah, I got your meaning, how about attached patch?
> >> Yeah something like this, but I would just give the
> >> DRM_SYNCOBJ_CREATE_SIGNALED instead.
> >>
> >> I mean that's what this flag is good for isn't it?
> > Yeah, I give a flag initally when creating patch, but as you know, there is 
> > a
> swich case not be able to use that flag:
> >
> >       case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
> >       fd = get_unused_fd_flags(O_CLOEXEC);
> >       if (fd < 0) {
> >       dma_fence_put(fence);
> >       return fd;
> >       }
> >
> >       sync_file = sync_file_create(fence);
> >       dma_fence_put(fence);
> >       if (!sync_file) {
> >       put_unused_fd(fd);
> >       return -ENOMEM;
> >       }
> >
> >       fd_install(fd, sync_file->file);
> >       info->out.handle = fd;
> >       return 0;
> >
> > So I change to stub fence instead.
> 
> Yeah, I've missed that case. Not sure if the sync_file can deal with a NULL
> fence.
> 
> We should then probably move the stub fence function into
> dma_fence_stub.c under drivers/dma-buf to keep the stuff together.

Yes, you wrap it to review first with your stub fence, we can do it separately 
first.

-David 
> 
> >
> > -David
> >
> > I have not applied this patch.
> > The issue was trying to address is when amdgpu_cs_ioctl() failed due to
> low memory (ENOMEM) but userspace chose to proceed and called
> amdgpu_cs_fence_to_handle_ioctl().
> > In amdgpu_cs_fence_to_handle_ioctl(), fence is null and later causing
> > NULL pointer dereference, this patch was to avoid that and system panic
> But I understand now that its a valid case retuning NULL if fence was already
> signaled but need to handle case so avoid kernel panic. Seems David patch
> should fix this, I will test it tomorrow.
> 
> Mhm, but don't we bail out with an error if we ask for a failed command
> submission? If not that sounds like a bug as well.
> 
> Christian.
> 
> >
> > -Deepak
> >> Christian.
> >>
> >>> -David
> >>>> If that patch was applied then please revert it immediately.
> >>>>
> >>>> Christian.
> >>>>
> >>>>> -David
> >>>>>> If you have already pushed the patch then please revert.
> >>>>>>
> >>>>>> Christian.
> >>>>>>
> >>>>>>> Signed-off-by: Deepak Sharma 
> >>>>>>> ---
> >>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
> >>>>>>>    1 file changed, 2 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>> index 024dfbd87f11..14166cd8a12f 100644
> >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>> @@ -1403,6 +1403,8 @@ static struct dma_fence
> >>>>>>> *amdgpu_cs_get_fence(struct amdgpu_device *adev,
> >>>>>>>      fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);
> >>>>>>>    amdgpu_ctx_put(ctx);
> >>>>>>> +    if(!fence)
> >>>>>>> +    return ERR_PTR(-EINVAL);
> >>>>>>>      return fence;
> >>>>>>>    }
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-26 Thread Christian König

Am 26.11.18 um 02:59 schrieb Sharma, Deepak:


在 2018/11/24 2:10, Koenig, Christian 写道:

Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):

在 2018/11/23 21:30, Koenig, Christian 写道:

Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):

在 2018/11/22 19:25, Christian König 写道:

Am 22.11.18 um 07:56 schrieb Sharma, Deepak:

when returned fence is not valid mostly due to userspace ignored
previous error causes NULL pointer dereference.

Again, this is clearly incorrect. The my other mails on the
earlier patch.

Sorry for I didn't get your history, but looks from the patch
itself, it is still a valid patch, isn't it?

No, the semantic of amdgpu_ctx_get_fence() is that we return NULL
when the fence is already signaled.

So this patch could totally break userspace because it changes the
behavior when we try to sync to an already signaled fence.

Ah, I got your meaning, how about attached patch?

Yeah something like this, but I would just give the
DRM_SYNCOBJ_CREATE_SIGNALED instead.

I mean that's what this flag is good for isn't it?

Yeah, I give a flag initally when creating patch, but as you know, there is a 
swich case not be able to use that flag:

      case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
      fd = get_unused_fd_flags(O_CLOEXEC);
      if (fd < 0) {
      dma_fence_put(fence);
      return fd;
      }

      sync_file = sync_file_create(fence);
      dma_fence_put(fence);
      if (!sync_file) {
      put_unused_fd(fd);
      return -ENOMEM;
      }

      fd_install(fd, sync_file->file);
      info->out.handle = fd;
      return 0;

So I change to stub fence instead.


Yeah, I've missed that case. Not sure if the sync_file can deal with a 
NULL fence.


We should then probably move the stub fence function into 
dma_fence_stub.c under drivers/dma-buf to keep the stuff together.




-David

I have not applied this patch.
The issue was trying to address is when amdgpu_cs_ioctl() failed due to low 
memory (ENOMEM) but userspace chose to proceed and called 
amdgpu_cs_fence_to_handle_ioctl().
In amdgpu_cs_fence_to_handle_ioctl(), fence is null and later causing NULL 
pointer dereference, this patch was to avoid that and system panic
But I understand now that its a valid case retuning NULL if fence was already 
signaled but need to handle case so avoid kernel panic. Seems David patch 
should fix this, I will test it tomorrow.


Mhm, but don't we bail out with an error if we ask for a failed command 
submission? If not that sounds like a bug as well.


Christian.



-Deepak

Christian.


-David

If that patch was applied then please revert it immediately.

Christian.


-David

If you have already pushed the patch then please revert.

Christian.


Signed-off-by: Deepak Sharma 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 024dfbd87f11..14166cd8a12f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1403,6 +1403,8 @@ static struct dma_fence
*amdgpu_cs_get_fence(struct amdgpu_device *adev,
     fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);
   amdgpu_ctx_put(ctx);
+    if(!fence)
+    return ERR_PTR(-EINVAL);
     return fence;
   }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-25 Thread Sharma, Deepak


在 2018/11/24 2:10, Koenig, Christian 写道:
> Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):
>> 在 2018/11/23 21:30, Koenig, Christian 写道:
>>> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
 在 2018/11/22 19:25, Christian König 写道:
> Am 22.11.18 um 07:56 schrieb Sharma, Deepak:
>> when returned fence is not valid mostly due to userspace ignored 
>> previous error causes NULL pointer dereference.
> Again, this is clearly incorrect. The my other mails on the 
> earlier patch.
 Sorry for I didn't get your history, but looks from the patch 
 itself, it is still a valid patch, isn't it?
>>> No, the semantic of amdgpu_ctx_get_fence() is that we return NULL 
>>> when the fence is already signaled.
>>>
>>> So this patch could totally break userspace because it changes the 
>>> behavior when we try to sync to an already signaled fence.
>> Ah, I got your meaning, how about attached patch?
> Yeah something like this, but I would just give the 
> DRM_SYNCOBJ_CREATE_SIGNALED instead.
>
> I mean that's what this flag is good for isn't it?
Yeah, I give a flag initally when creating patch, but as you know, there is a 
swich case not be able to use that flag:

     case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
     fd = get_unused_fd_flags(O_CLOEXEC);
     if (fd < 0) {
     dma_fence_put(fence);
     return fd;
     }

     sync_file = sync_file_create(fence);
     dma_fence_put(fence);
     if (!sync_file) {
     put_unused_fd(fd);
     return -ENOMEM;
     }

     fd_install(fd, sync_file->file);
     info->out.handle = fd;
     return 0;

So I change to stub fence instead.

-David

I have not applied this patch.
The issue was trying to address is when amdgpu_cs_ioctl() failed due to low 
memory (ENOMEM) but userspace chose to proceed and called 
amdgpu_cs_fence_to_handle_ioctl().
In amdgpu_cs_fence_to_handle_ioctl(), fence is null and later causing NULL 
pointer dereference, this patch was to avoid that and system panic
But I understand now that its a valid case retuning NULL if fence was already 
signaled but need to handle case so avoid kernel panic. Seems David patch 
should fix this, I will test it tomorrow. 

-Deepak
>
> Christian.
>
>> -David
>>> If that patch was applied then please revert it immediately.
>>>
>>> Christian.
>>>
 -David
> If you have already pushed the patch then please revert.
>
> Christian.
>
>> Signed-off-by: Deepak Sharma 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 024dfbd87f11..14166cd8a12f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -1403,6 +1403,8 @@ static struct dma_fence 
>> *amdgpu_cs_get_fence(struct amdgpu_device *adev,
>>     fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);
>>   amdgpu_ctx_put(ctx);
>> +    if(!fence)
>> +    return ERR_PTR(-EINVAL);
>>     return fence;
>>   }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-24 Thread Chunming Zhou


在 2018/11/24 2:10, Koenig, Christian 写道:
> Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):
>> 在 2018/11/23 21:30, Koenig, Christian 写道:
>>> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
 在 2018/11/22 19:25, Christian König 写道:
> Am 22.11.18 um 07:56 schrieb Sharma, Deepak:
>> when returned fence is not valid mostly due to userspace ignored
>> previous error causes NULL pointer dereference.
> Again, this is clearly incorrect. The my other mails on the earlier
> patch.
 Sorry for I didn't get your history, but looks from the patch itself, it
 is still a valid patch, isn't it?
>>> No, the semantic of amdgpu_ctx_get_fence() is that we return NULL when
>>> the fence is already signaled.
>>>
>>> So this patch could totally break userspace because it changes the
>>> behavior when we try to sync to an already signaled fence.
>> Ah, I got your meaning, how about attached patch?
> Yeah something like this, but I would just give the
> DRM_SYNCOBJ_CREATE_SIGNALED instead.
>
> I mean that's what this flag is good for isn't it?
Yeah, I give a flag initally when creating patch, but as you know, there 
is a swich case not be able to use that flag:

     case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
     fd = get_unused_fd_flags(O_CLOEXEC);
     if (fd < 0) {
     dma_fence_put(fence);
     return fd;
     }

     sync_file = sync_file_create(fence);
     dma_fence_put(fence);
     if (!sync_file) {
     put_unused_fd(fd);
     return -ENOMEM;
     }

     fd_install(fd, sync_file->file);
     info->out.handle = fd;
     return 0;

So I change to stub fence instead.

-David
>
> Christian.
>
>> -David
>>> If that patch was applied then please revert it immediately.
>>>
>>> Christian.
>>>
 -David
> If you have already pushed the patch then please revert.
>
> Christian.
>
>> Signed-off-by: Deepak Sharma 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 024dfbd87f11..14166cd8a12f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -1403,6 +1403,8 @@ static struct dma_fence
>> *amdgpu_cs_get_fence(struct amdgpu_device *adev,
>>     fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);
>>   amdgpu_ctx_put(ctx);
>> +    if(!fence)
>> +    return ERR_PTR(-EINVAL);
>>     return fence;
>>   }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-23 Thread Koenig, Christian
Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):
>
> 在 2018/11/23 21:30, Koenig, Christian 写道:
>> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
>>> 在 2018/11/22 19:25, Christian König 写道:
 Am 22.11.18 um 07:56 schrieb Sharma, Deepak:
> when returned fence is not valid mostly due to userspace ignored
> previous error causes NULL pointer dereference.
 Again, this is clearly incorrect. The my other mails on the earlier
 patch.
>>> Sorry for I didn't get your history, but looks from the patch itself, it
>>> is still a valid patch, isn't it?
>> No, the semantic of amdgpu_ctx_get_fence() is that we return NULL when
>> the fence is already signaled.
>>
>> So this patch could totally break userspace because it changes the
>> behavior when we try to sync to an already signaled fence.
> Ah, I got your meaning, how about attached patch?

Yeah something like this, but I would just give the 
DRM_SYNCOBJ_CREATE_SIGNALED instead.

I mean that's what this flag is good for isn't it?

Christian.

>
> -David
>> If that patch was applied then please revert it immediately.
>>
>> Christian.
>>
>>> -David
 If you have already pushed the patch then please revert.

 Christian.

> Signed-off-by: Deepak Sharma 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 024dfbd87f11..14166cd8a12f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1403,6 +1403,8 @@ static struct dma_fence
> *amdgpu_cs_get_fence(struct amdgpu_device *adev,
>    fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);
>  amdgpu_ctx_put(ctx);
> +    if(!fence)
> +    return ERR_PTR(-EINVAL);
>    return fence;
>  }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-23 Thread Chunming Zhou


在 2018/11/23 21:30, Koenig, Christian 写道:
> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
>> 在 2018/11/22 19:25, Christian König 写道:
>>> Am 22.11.18 um 07:56 schrieb Sharma, Deepak:
 when returned fence is not valid mostly due to userspace ignored
 previous error causes NULL pointer dereference.
>>> Again, this is clearly incorrect. The my other mails on the earlier
>>> patch.
>> Sorry for I didn't get your history, but looks from the patch itself, it
>> is still a valid patch, isn't it?
> No, the semantic of amdgpu_ctx_get_fence() is that we return NULL when
> the fence is already signaled.
>
> So this patch could totally break userspace because it changes the
> behavior when we try to sync to an already signaled fence.
Ah, I got your meaning, how about attached patch?

-David
>
> If that patch was applied then please revert it immediately.
>
> Christian.
>
>> -David
>>> If you have already pushed the patch then please revert.
>>>
>>> Christian.
>>>
 Signed-off-by: Deepak Sharma 
 ---
     drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
     1 file changed, 2 insertions(+)

 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
 index 024dfbd87f11..14166cd8a12f 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
 @@ -1403,6 +1403,8 @@ static struct dma_fence
 *amdgpu_cs_get_fence(struct amdgpu_device *adev,
       fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);
     amdgpu_ctx_put(ctx);
 +    if(!fence)
 +    return ERR_PTR(-EINVAL);
       return fence;
     }

From 3640a18c31e7b786129286615fcdf397e1142451 Mon Sep 17 00:00:00 2001
From: Chunming Zhou 
Date: Fri, 23 Nov 2018 22:05:19 +0800
Subject: [PATCH] drm/amdgpu: fix signaled fence isn't handled
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Chunming Zhou 
Cc: Sharma, Deepak 
CC: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +++
 drivers/gpu/drm/drm_syncobj.c  | 1 +
 include/drm/drm_syncobj.h  | 1 +
 3 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 6a823b58b3b8..e960f9864e9f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1505,6 +1505,9 @@ int amdgpu_cs_fence_to_handle_ioctl(struct drm_device 
*dev, void *data,
if (IS_ERR(fence))
return PTR_ERR(fence);
 
+/* that means fence was signaled */
+   if (!fence)
+   fence = drm_syncobj_get_stub_fence();
switch (info->in.what) {
case AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ:
r = drm_syncobj_create(, 0, fence);
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 5f2df10e51c3..e5621f80d501 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -97,6 +97,7 @@ struct dma_fence *drm_syncobj_get_stub_fence(void)
 
return dma_fence_get(_fence);
 }
+EXPORT_SYMBOL(drm_syncobj_get_stub_fence);
 /**
  * drm_syncobj_find - lookup and reference a sync object.
  * @file_private: drm file private pointer
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
index 29244cbcd05e..93e9e9b159ab 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -147,5 +147,6 @@ int drm_syncobj_get_handle(struct drm_file *file_private,
 int drm_syncobj_get_fd(struct drm_syncobj *syncobj, int *p_fd);
 int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, u64 flags,
 struct dma_fence **fence);
+struct dma_fence *drm_syncobj_get_stub_fence(void);
 
 #endif
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-23 Thread Koenig, Christian
Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
>
> 在 2018/11/22 19:25, Christian König 写道:
>> Am 22.11.18 um 07:56 schrieb Sharma, Deepak:
>>> when returned fence is not valid mostly due to userspace ignored
>>> previous error causes NULL pointer dereference.
>> Again, this is clearly incorrect. The my other mails on the earlier
>> patch.
> Sorry for I didn't get your history, but looks from the patch itself, it
> is still a valid patch, isn't it?

No, the semantic of amdgpu_ctx_get_fence() is that we return NULL when 
the fence is already signaled.

So this patch could totally break userspace because it changes the 
behavior when we try to sync to an already signaled fence.

If that patch was applied then please revert it immediately.

Christian.

>
> -David
>> If you have already pushed the patch then please revert.
>>
>> Christian.
>>
>>> Signed-off-by: Deepak Sharma 
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index 024dfbd87f11..14166cd8a12f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -1403,6 +1403,8 @@ static struct dma_fence
>>> *amdgpu_cs_get_fence(struct amdgpu_device *adev,
>>>      fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);
>>>    amdgpu_ctx_put(ctx);
>>> +    if(!fence)
>>> +    return ERR_PTR(-EINVAL);
>>>      return fence;
>>>    }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-22 Thread Christian König

Am 22.11.18 um 07:56 schrieb Sharma, Deepak:

when returned fence is not valid mostly due to userspace ignored
previous error causes NULL pointer dereference.


Again, this is clearly incorrect. The my other mails on the earlier patch.

If you have already pushed the patch then please revert.

Christian.



Signed-off-by: Deepak Sharma 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 024dfbd87f11..14166cd8a12f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1403,6 +1403,8 @@ static struct dma_fence *amdgpu_cs_get_fence(struct 
amdgpu_device *adev,
  
  	fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);

amdgpu_ctx_put(ctx);
+   if(!fence)
+   return ERR_PTR(-EINVAL);
  
  	return fence;

  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-21 Thread zhoucm1



On 2018年11月22日 14:56, Sharma, Deepak wrote:

when returned fence is not valid mostly due to userspace ignored
previous error causes NULL pointer dereference.

Signed-off-by: Deepak Sharma 

Reviewed-by: Chunming Zhou 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 024dfbd87f11..14166cd8a12f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1403,6 +1403,8 @@ static struct dma_fence *amdgpu_cs_get_fence(struct 
amdgpu_device *adev,
  
  	fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);

amdgpu_ctx_put(ctx);
+   if(!fence)
+   return ERR_PTR(-EINVAL);
  
  	return fence;

  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-21 Thread Sharma, Deepak
when returned fence is not valid mostly due to userspace ignored
previous error causes NULL pointer dereference.

Signed-off-by: Deepak Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 024dfbd87f11..14166cd8a12f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1403,6 +1403,8 @@ static struct dma_fence *amdgpu_cs_get_fence(struct 
amdgpu_device *adev,
 
fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);
amdgpu_ctx_put(ctx);
+   if(!fence)
+   return ERR_PTR(-EINVAL);
 
return fence;
 }
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx