Re: [PATCH] drm/amdkfd: Fix an illegal memory access
On 2023/2/22 3:17, Christophe JAILLET wrote: > Le 21/02/2023 à 17:26, Felix Kuehling a écrit : >> >> On 2023-02-21 06:35, qu.huang-fxuvxftifdnyg1zeobx...@public.gmane.org wrote: >>> From: Qu Huang >>> >>> In the kfd_wait_on_events() function, the kfd_event_waiter structure is >>> allocated by alloc_event_waiters(), but the event field of the waiter >>> structure is not initialized; When copy_from_user() fails in the >>> kfd_wait_on_events() function, it will enter exception handling to >>> release the previously allocated memory of the waiter structure; >>> Due to the event field of the waiters structure being accessed >>> in the free_waiters() function, this results in illegal memory access >>> and system crash, here is the crash log: >>> >>> localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0 >>> localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082 >>> localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: >>> 002c >>> localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: >>> e7088f6a21d0 >>> localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: >>> aa53c362be64 >>> localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: >>> 0002 >>> localhost kernel: R13: 9e7ead15d600 R14: R15: >>> 9e7ead15d698 >>> localhost kernel: FS: 152a3d111700() GS:9e855ee8() >>> knlGS: >>> localhost kernel: CS: 0010 DS: ES: CR0: 80050033 >>> localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: >>> 003506e0 >>> localhost kernel: Call Trace: >>> localhost kernel: _raw_spin_lock_irqsave+0x30/0x40 >>> localhost kernel: remove_wait_queue+0x12/0x50 >>> localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu] >>> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 >>> localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu] >>> localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu] >>> localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu] >>> localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 >>> localhost kernel: __x64_sys_ioctl+0x8e/0xd0 >>> localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0 >>> localhost kernel: do_syscall_64+0x33/0x80 >>> localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> localhost kernel: RIP: 0033:0x152a4dff68d7 >>> >>> Signed-off-by: Qu Huang >>> --- >>> drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c >>> index 729d26d..e5faaad 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c >>> @@ -787,6 +787,7 @@ static struct kfd_event_waiter >>> *alloc_event_waiters(uint32_t num_events) >>> for (i = 0; (event_waiters) && (i < num_events) ; i++) { >>> init_wait(&event_waiters[i].wait); >>> event_waiters[i].activated = false; >>> + event_waiters[i].event = NULL; >> >> Thank you for catching this. We're often lazy about initializing things to >> NULL or 0 because most of our data structures are allocated with kzalloc or >> similar. I'm not sure why we're not doing this here. If we allocated >> event_waiters with kcalloc, we could also remove the initialization of >> activated. I think that would be the cleaner and safer solution. > > Hi, > > I think that the '(event_waiters) &&' in the 'for' can also be removed. > 'event_waiters' is already NULL tested a few lines above > > > Just my 2c. > > CJ > Thanks for the suggestions from Felix and CJ, I have re-submitted patch v2, please review it: https://lore.kernel.org/all/ea5b997309825b21e406f9bad2ce8...@linux.dev/ Regards, Qu >> >> Regards, >> Felix >> >> >>> } >>> >>> return event_waiters; >>> -- >>> 1.8.3.1 >> >
Re: [PATCH] drm/amdkfd: Fix an illegal memory access
Le 21/02/2023 à 17:26, Felix Kuehling a écrit : On 2023-02-21 06:35, qu.huang-fxuvxftifdnyg1zeobx...@public.gmane.org wrote: From: Qu Huang In the kfd_wait_on_events() function, the kfd_event_waiter structure is allocated by alloc_event_waiters(), but the event field of the waiter structure is not initialized; When copy_from_user() fails in the kfd_wait_on_events() function, it will enter exception handling to release the previously allocated memory of the waiter structure; Due to the event field of the waiters structure being accessed in the free_waiters() function, this results in illegal memory access and system crash, here is the crash log: localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0 localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082 localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 002c localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: e7088f6a21d0 localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: aa53c362be64 localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 0002 localhost kernel: R13: 9e7ead15d600 R14: R15: 9e7ead15d698 localhost kernel: FS: 152a3d111700() GS:9e855ee8() knlGS: localhost kernel: CS: 0010 DS: ES: CR0: 80050033 localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 003506e0 localhost kernel: Call Trace: localhost kernel: _raw_spin_lock_irqsave+0x30/0x40 localhost kernel: remove_wait_queue+0x12/0x50 localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu] localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu] localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: __x64_sys_ioctl+0x8e/0xd0 localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0 localhost kernel: do_syscall_64+0x33/0x80 localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 localhost kernel: RIP: 0033:0x152a4dff68d7 Signed-off-by: Qu Huang --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 729d26d..e5faaad 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -787,6 +787,7 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) for (i = 0; (event_waiters) && (i < num_events) ; i++) { init_wait(&event_waiters[i].wait); event_waiters[i].activated = false; + event_waiters[i].event = NULL; Thank you for catching this. We're often lazy about initializing things to NULL or 0 because most of our data structures are allocated with kzalloc or similar. I'm not sure why we're not doing this here. If we allocated event_waiters with kcalloc, we could also remove the initialization of activated. I think that would be the cleaner and safer solution. Hi, I think that the '(event_waiters) &&' in the 'for' can also be removed. 'event_waiters' is already NULL tested a few lines above Just my 2c. CJ Regards, Felix } return event_waiters; -- 1.8.3.1
Re: [PATCH] drm/amdkfd: Fix an illegal memory access
On 2023-02-21 06:35, qu.hu...@linux.dev wrote: From: Qu Huang In the kfd_wait_on_events() function, the kfd_event_waiter structure is allocated by alloc_event_waiters(), but the event field of the waiter structure is not initialized; When copy_from_user() fails in the kfd_wait_on_events() function, it will enter exception handling to release the previously allocated memory of the waiter structure; Due to the event field of the waiters structure being accessed in the free_waiters() function, this results in illegal memory access and system crash, here is the crash log: localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0 localhost kernel: RSP: 0018:aa53c362bd60 EFLAGS: 00010082 localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0282 RCX: 002c localhost kernel: RDX: 9e855eeacb80 RSI: 279c RDI: e7088f6a21d0 localhost kernel: RBP: e7088f6a21d0 R08: 002c R09: aa53c362be64 localhost kernel: R10: aa53c362bbd8 R11: 0001 R12: 0002 localhost kernel: R13: 9e7ead15d600 R14: R15: 9e7ead15d698 localhost kernel: FS: 152a3d111700() GS:9e855ee8() knlGS: localhost kernel: CS: 0010 DS: ES: CR0: 80050033 localhost kernel: CR2: 15293810 CR3: 00044d7a4000 CR4: 003506e0 localhost kernel: Call Trace: localhost kernel: _raw_spin_lock_irqsave+0x30/0x40 localhost kernel: remove_wait_queue+0x12/0x50 localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu] localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu] localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu] localhost kernel: ? ftrace_graph_caller+0xa0/0xa0 localhost kernel: __x64_sys_ioctl+0x8e/0xd0 localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0 localhost kernel: do_syscall_64+0x33/0x80 localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 localhost kernel: RIP: 0033:0x152a4dff68d7 Signed-off-by: Qu Huang --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 729d26d..e5faaad 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -787,6 +787,7 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) for (i = 0; (event_waiters) && (i < num_events) ; i++) { init_wait(&event_waiters[i].wait); event_waiters[i].activated = false; + event_waiters[i].event = NULL; Thank you for catching this. We're often lazy about initializing things to NULL or 0 because most of our data structures are allocated with kzalloc or similar. I'm not sure why we're not doing this here. If we allocated event_waiters with kcalloc, we could also remove the initialization of activated. I think that would be the cleaner and safer solution. Regards, Felix } return event_waiters; -- 1.8.3.1