Re: [PATCH next] vhost_task: after freeing vhost_task it should not be accessed in vhost_task_fn
On Wed, May 01, 2024 at 10:57:38AM -0500, Mike Christie wrote: > On 5/1/24 2:50 AM, Hillf Danton wrote: > > On Wed, 1 May 2024 02:01:20 -0400 Michael S. Tsirkin > >> > >> and then it failed testing. > >> > > So did my patch [1] but then the reason was spotted [2,3] > > > > [1] https://lore.kernel.org/lkml/20240430110209.4310-1-hdan...@sina.com/ > > [2] https://lore.kernel.org/lkml/20240430225005.4368-1-hdan...@sina.com/ > > [3] https://lore.kernel.org/lkml/a7f8470617589...@google.com/ > > Just to make sure I understand the conclusion. > > Edward's patch that just swaps the order of the calls: > > https://lore.kernel.org/lkml/tencent_546da49414e876eebecf2c78d26d242ee...@qq.com/ > > fixes the UAF. I tested the same in my setup. However, when you guys tested it > with sysbot, it also triggered a softirq/RCU warning. > > The softirq/RCU part of the issue is fixed with this commit: > > https://lore.kernel.org/all/20240427102808.29356-1-qiang.zhang1...@gmail.com/ > > commit 1dd1eff161bd55968d3d46bc36def62d71fb4785 > Author: Zqiang > Date: Sat Apr 27 18:28:08 2024 +0800 > > softirq: Fix suspicious RCU usage in __do_softirq() > > The problem was that I was testing with -next master which has that patch. > It looks like you guys were testing against bb7a2467e6be which didn't have > the patch, and so that's why you guys still hit the softirq/RCU issue. Later > when you added that patch to your patch, it worked with syzbot. > > So is it safe to assume that the softirq/RCU patch above will be upstream > when the vhost changes go in or is there a tag I need to add to my patches? That patch is upstream now. I rebased and asked syzbot to test https://lore.kernel.org/lkml/tencent_546da49414e876eebecf2c78d26d242ee...@qq.com/ on top. If that passes I will squash.
Re: [PATCH next] vhost_task: after freeing vhost_task it should not be accessed in vhost_task_fn
On Wed, May 01, 2024 at 10:57:38AM -0500, Mike Christie wrote: > On 5/1/24 2:50 AM, Hillf Danton wrote: > > On Wed, 1 May 2024 02:01:20 -0400 Michael S. Tsirkin > >> > >> and then it failed testing. > >> > > So did my patch [1] but then the reason was spotted [2,3] > > > > [1] https://lore.kernel.org/lkml/20240430110209.4310-1-hdan...@sina.com/ > > [2] https://lore.kernel.org/lkml/20240430225005.4368-1-hdan...@sina.com/ > > [3] https://lore.kernel.org/lkml/a7f8470617589...@google.com/ > > Just to make sure I understand the conclusion. > > Edward's patch that just swaps the order of the calls: > > https://lore.kernel.org/lkml/tencent_546da49414e876eebecf2c78d26d242ee...@qq.com/ > > fixes the UAF. I tested the same in my setup. However, when you guys tested it > with sysbot, it also triggered a softirq/RCU warning. > > The softirq/RCU part of the issue is fixed with this commit: > > https://lore.kernel.org/all/20240427102808.29356-1-qiang.zhang1...@gmail.com/ > > commit 1dd1eff161bd55968d3d46bc36def62d71fb4785 > Author: Zqiang > Date: Sat Apr 27 18:28:08 2024 +0800 > > softirq: Fix suspicious RCU usage in __do_softirq() > > The problem was that I was testing with -next master which has that patch. > It looks like you guys were testing against bb7a2467e6be which didn't have > the patch, and so that's why you guys still hit the softirq/RCU issue. Later > when you added that patch to your patch, it worked with syzbot. > > So is it safe to assume that the softirq/RCU patch above will be upstream > when the vhost changes go in or is there a tag I need to add to my patches? Two points: - I do not want bisect broken. If you depend on this patch either I pick it too before your patch, or we defer until 1dd1eff161bd55968d3d46bc36def62d71fb4785 is merged. You can also ask for that patch to be merged in this cycle. - Do not assume - pls push somewhere a hash based on vhost that syzbot can test and confirm all is well. Thanks!
Re: [PATCH next] vhost_task: after freeing vhost_task it should not be accessed in vhost_task_fn
On 5/1/24 2:50 AM, Hillf Danton wrote: > On Wed, 1 May 2024 02:01:20 -0400 Michael S. Tsirkin >> >> and then it failed testing. >> > So did my patch [1] but then the reason was spotted [2,3] > > [1] https://lore.kernel.org/lkml/20240430110209.4310-1-hdan...@sina.com/ > [2] https://lore.kernel.org/lkml/20240430225005.4368-1-hdan...@sina.com/ > [3] https://lore.kernel.org/lkml/a7f8470617589...@google.com/ Just to make sure I understand the conclusion. Edward's patch that just swaps the order of the calls: https://lore.kernel.org/lkml/tencent_546da49414e876eebecf2c78d26d242ee...@qq.com/ fixes the UAF. I tested the same in my setup. However, when you guys tested it with sysbot, it also triggered a softirq/RCU warning. The softirq/RCU part of the issue is fixed with this commit: https://lore.kernel.org/all/20240427102808.29356-1-qiang.zhang1...@gmail.com/ commit 1dd1eff161bd55968d3d46bc36def62d71fb4785 Author: Zqiang Date: Sat Apr 27 18:28:08 2024 +0800 softirq: Fix suspicious RCU usage in __do_softirq() The problem was that I was testing with -next master which has that patch. It looks like you guys were testing against bb7a2467e6be which didn't have the patch, and so that's why you guys still hit the softirq/RCU issue. Later when you added that patch to your patch, it worked with syzbot. So is it safe to assume that the softirq/RCU patch above will be upstream when the vhost changes go in or is there a tag I need to add to my patches?
Re: [PATCH next] vhost_task: after freeing vhost_task it should not be accessed in vhost_task_fn
On Wed, 1 May 2024 02:01:20 -0400 Michael S. Tsirkin > > and then it failed testing. > So did my patch [1] but then the reason was spotted [2,3] [1] https://lore.kernel.org/lkml/20240430110209.4310-1-hdan...@sina.com/ [2] https://lore.kernel.org/lkml/20240430225005.4368-1-hdan...@sina.com/ [3] https://lore.kernel.org/lkml/a7f8470617589...@google.com/
Re: [PATCH next] vhost_task: after freeing vhost_task it should not be accessed in vhost_task_fn
On Wed, May 01, 2024 at 08:15:44AM +0800, Hillf Danton wrote: > On Tue, Apr 30, 2024 at 11:23:04AM -0500, Mike Christie wrote: > > On 4/30/24 8:05 AM, Edward Adam Davis wrote: > > > static int vhost_task_fn(void *data) > > > { > > > struct vhost_task *vtsk = data; > > > @@ -51,7 +51,7 @@ static int vhost_task_fn(void *data) > > > schedule(); > > > } > > > > > > - mutex_lock(&vtsk->exit_mutex); > > > + mutex_lock(&exit_mutex); > > > /* > > >* If a vhost_task_stop and SIGKILL race, we can ignore the SIGKILL. > > >* When the vhost layer has called vhost_task_stop it's already stopped > > > @@ -62,7 +62,7 @@ static int vhost_task_fn(void *data) > > > vtsk->handle_sigkill(vtsk->data); > > > } > > > complete(&vtsk->exited); > > > - mutex_unlock(&vtsk->exit_mutex); > > > + mutex_unlock(&exit_mutex); > > > > > > > Edward, thanks for the patch. I think though I just needed to swap the > > order of the calls above. > > > > Instead of: > > > > complete(&vtsk->exited); > > mutex_unlock(&vtsk->exit_mutex); > > > > it should have been: > > > > mutex_unlock(&vtsk->exit_mutex); > > complete(&vtsk->exited); > > JFYI Edward did it [1] > > [1] > https://lore.kernel.org/lkml/tencent_546da49414e876eebecf2c78d26d242ee...@qq.com/ and then it failed testing. > > > > If my analysis is correct, then Michael do you want me to resubmit a > > patch on top of your vhost branch or resubmit the entire patchset?
Re: [PATCH next] vhost_task: after freeing vhost_task it should not be accessed in vhost_task_fn
On Tue, Apr 30, 2024 at 08:01:11PM -0500, Mike Christie wrote: > On 4/30/24 7:15 PM, Hillf Danton wrote: > > On Tue, Apr 30, 2024 at 11:23:04AM -0500, Mike Christie wrote: > >> On 4/30/24 8:05 AM, Edward Adam Davis wrote: > >>> static int vhost_task_fn(void *data) > >>> { > >>> struct vhost_task *vtsk = data; > >>> @@ -51,7 +51,7 @@ static int vhost_task_fn(void *data) > >>> schedule(); > >>> } > >>> > >>> - mutex_lock(&vtsk->exit_mutex); > >>> + mutex_lock(&exit_mutex); > >>> /* > >>>* If a vhost_task_stop and SIGKILL race, we can ignore the SIGKILL. > >>>* When the vhost layer has called vhost_task_stop it's already stopped > >>> @@ -62,7 +62,7 @@ static int vhost_task_fn(void *data) > >>> vtsk->handle_sigkill(vtsk->data); > >>> } > >>> complete(&vtsk->exited); > >>> - mutex_unlock(&vtsk->exit_mutex); > >>> + mutex_unlock(&exit_mutex); > >>> > >> > >> Edward, thanks for the patch. I think though I just needed to swap the > >> order of the calls above. > >> > >> Instead of: > >> > >> complete(&vtsk->exited); > >> mutex_unlock(&vtsk->exit_mutex); > >> > >> it should have been: > >> > >> mutex_unlock(&vtsk->exit_mutex); > >> complete(&vtsk->exited); > > > > JFYI Edward did it [1] > > > > [1] > > https://lore.kernel.org/lkml/tencent_546da49414e876eebecf2c78d26d242ee...@qq.com/ > > Thanks. > > I tested the code with that change and it no longer triggers the UAF. Weird but syzcaller said that yes it triggers. Compare dcc0ca06174e6...@google.com which tests the order mutex_unlock(&vtsk->exit_mutex); complete(&vtsk->exited); that you like and says it triggers and 97bda90617521...@google.com which says it does not trigger. Whatever you do please send it to syzcaller in the original thread and then when you post please include the syzcaller report. Given this gets confusing I'm fine with just a fixup patch, and note in the commit log where I should squash it. > I've fixed up the original patch that had the bug and am going to > resubmit the patchset like how Michael requested. >
Re: [PATCH next] vhost_task: after freeing vhost_task it should not be accessed in vhost_task_fn
On 4/30/24 7:15 PM, Hillf Danton wrote: > On Tue, Apr 30, 2024 at 11:23:04AM -0500, Mike Christie wrote: >> On 4/30/24 8:05 AM, Edward Adam Davis wrote: >>> static int vhost_task_fn(void *data) >>> { >>> struct vhost_task *vtsk = data; >>> @@ -51,7 +51,7 @@ static int vhost_task_fn(void *data) >>> schedule(); >>> } >>> >>> - mutex_lock(&vtsk->exit_mutex); >>> + mutex_lock(&exit_mutex); >>> /* >>> * If a vhost_task_stop and SIGKILL race, we can ignore the SIGKILL. >>> * When the vhost layer has called vhost_task_stop it's already stopped >>> @@ -62,7 +62,7 @@ static int vhost_task_fn(void *data) >>> vtsk->handle_sigkill(vtsk->data); >>> } >>> complete(&vtsk->exited); >>> - mutex_unlock(&vtsk->exit_mutex); >>> + mutex_unlock(&exit_mutex); >>> >> >> Edward, thanks for the patch. I think though I just needed to swap the >> order of the calls above. >> >> Instead of: >> >> complete(&vtsk->exited); >> mutex_unlock(&vtsk->exit_mutex); >> >> it should have been: >> >> mutex_unlock(&vtsk->exit_mutex); >> complete(&vtsk->exited); > > JFYI Edward did it [1] > > [1] > https://lore.kernel.org/lkml/tencent_546da49414e876eebecf2c78d26d242ee...@qq.com/ Thanks. I tested the code with that change and it no longer triggers the UAF. I've fixed up the original patch that had the bug and am going to resubmit the patchset like how Michael requested.
Re: [PATCH next] vhost_task: after freeing vhost_task it should not be accessed in vhost_task_fn
On Tue, Apr 30, 2024 at 11:23:04AM -0500, Mike Christie wrote: > On 4/30/24 8:05 AM, Edward Adam Davis wrote: > > static int vhost_task_fn(void *data) > > { > > struct vhost_task *vtsk = data; > > @@ -51,7 +51,7 @@ static int vhost_task_fn(void *data) > > schedule(); > > } > > > > - mutex_lock(&vtsk->exit_mutex); > > + mutex_lock(&exit_mutex); > > /* > > * If a vhost_task_stop and SIGKILL race, we can ignore the SIGKILL. > > * When the vhost layer has called vhost_task_stop it's already stopped > > @@ -62,7 +62,7 @@ static int vhost_task_fn(void *data) > > vtsk->handle_sigkill(vtsk->data); > > } > > complete(&vtsk->exited); > > - mutex_unlock(&vtsk->exit_mutex); > > + mutex_unlock(&exit_mutex); > > > > Edward, thanks for the patch. I think though I just needed to swap the > order of the calls above. > > Instead of: > > complete(&vtsk->exited); > mutex_unlock(&vtsk->exit_mutex); > > it should have been: > > mutex_unlock(&vtsk->exit_mutex); > complete(&vtsk->exited); JFYI Edward did it [1] [1] https://lore.kernel.org/lkml/tencent_546da49414e876eebecf2c78d26d242ee...@qq.com/ > > If my analysis is correct, then Michael do you want me to resubmit a > patch on top of your vhost branch or resubmit the entire patchset?
Re: [PATCH next] vhost_task: after freeing vhost_task it should not be accessed in vhost_task_fn
On Tue, Apr 30, 2024 at 11:23:04AM -0500, Mike Christie wrote: > On 4/30/24 8:05 AM, Edward Adam Davis wrote: > > static int vhost_task_fn(void *data) > > { > > struct vhost_task *vtsk = data; > > @@ -51,7 +51,7 @@ static int vhost_task_fn(void *data) > > schedule(); > > } > > > > - mutex_lock(&vtsk->exit_mutex); > > + mutex_lock(&exit_mutex); > > /* > > * If a vhost_task_stop and SIGKILL race, we can ignore the SIGKILL. > > * When the vhost layer has called vhost_task_stop it's already stopped > > @@ -62,7 +62,7 @@ static int vhost_task_fn(void *data) > > vtsk->handle_sigkill(vtsk->data); > > } > > complete(&vtsk->exited); > > - mutex_unlock(&vtsk->exit_mutex); > > + mutex_unlock(&exit_mutex); > > > > Edward, thanks for the patch. I think though I just needed to swap the > order of the calls above. > > Instead of: > > complete(&vtsk->exited); > mutex_unlock(&vtsk->exit_mutex); > > it should have been: > > mutex_unlock(&vtsk->exit_mutex); > complete(&vtsk->exited); > > If my analysis is correct, then Michael do you want me to resubmit a > patch on top of your vhost branch or resubmit the entire patchset? Resubmit all please. -- MST
Re: [PATCH next] vhost_task: after freeing vhost_task it should not be accessed in vhost_task_fn
On 4/30/24 8:05 AM, Edward Adam Davis wrote: > static int vhost_task_fn(void *data) > { > struct vhost_task *vtsk = data; > @@ -51,7 +51,7 @@ static int vhost_task_fn(void *data) > schedule(); > } > > - mutex_lock(&vtsk->exit_mutex); > + mutex_lock(&exit_mutex); > /* >* If a vhost_task_stop and SIGKILL race, we can ignore the SIGKILL. >* When the vhost layer has called vhost_task_stop it's already stopped > @@ -62,7 +62,7 @@ static int vhost_task_fn(void *data) > vtsk->handle_sigkill(vtsk->data); > } > complete(&vtsk->exited); > - mutex_unlock(&vtsk->exit_mutex); > + mutex_unlock(&exit_mutex); > Edward, thanks for the patch. I think though I just needed to swap the order of the calls above. Instead of: complete(&vtsk->exited); mutex_unlock(&vtsk->exit_mutex); it should have been: mutex_unlock(&vtsk->exit_mutex); complete(&vtsk->exited); If my analysis is correct, then Michael do you want me to resubmit a patch on top of your vhost branch or resubmit the entire patchset?
[PATCH next] vhost_task: after freeing vhost_task it should not be accessed in vhost_task_fn
[syzbot reported] BUG: KASAN: slab-use-after-free in instrument_atomic_read include/linux/instrumented.h:68 [inline] BUG: KASAN: slab-use-after-free in atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline] BUG: KASAN: slab-use-after-free in __mutex_unlock_slowpath+0xef/0x750 kernel/locking/mutex.c:921 Read of size 8 at addr 888023632880 by task vhost-5104/5105 CPU: 0 PID: 5105 Comm: vhost-5104 Not tainted 6.9.0-rc5-next-20240426-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189 instrument_atomic_read include/linux/instrumented.h:68 [inline] atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline] __mutex_unlock_slowpath+0xef/0x750 kernel/locking/mutex.c:921 vhost_task_fn+0x3bc/0x3f0 kernel/vhost_task.c:65 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Allocated by task 5104: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:370 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387 kasan_kmalloc include/linux/kasan.h:211 [inline] kmalloc_trace_noprof+0x19c/0x2b0 mm/slub.c:4146 kmalloc_noprof include/linux/slab.h:660 [inline] kzalloc_noprof include/linux/slab.h:778 [inline] vhost_task_create+0x149/0x300 kernel/vhost_task.c:134 vhost_worker_create+0x17b/0x3f0 drivers/vhost/vhost.c:667 vhost_dev_set_owner+0x563/0x940 drivers/vhost/vhost.c:945 vhost_dev_ioctl+0xda/0xda0 drivers/vhost/vhost.c:2108 vhost_vsock_dev_ioctl+0x2bb/0xfa0 drivers/vhost/vsock.c:875 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 5104: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2190 [inline] slab_free mm/slub.c:4430 [inline] kfree+0x149/0x350 mm/slub.c:4551 vhost_worker_destroy drivers/vhost/vhost.c:629 [inline] vhost_workers_free drivers/vhost/vhost.c:648 [inline] vhost_dev_cleanup+0x9b0/0xba0 drivers/vhost/vhost.c:1051 vhost_vsock_dev_release+0x3aa/0x410 drivers/vhost/vsock.c:751 __fput+0x406/0x8b0 fs/file_table.c:422 __do_sys_close fs/open.c:1555 [inline] __se_sys_close fs/open.c:1540 [inline] __x64_sys_close+0x7f/0x110 fs/open.c:1540 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [Fix] Delete the member exit_mutex from the struct vhost_task and replace it with a declared global static mutex. Fixes: a3df30984f4f ("vhost_task: Handle SIGKILL by flushing work and exiting") Reported--by: syzbot+98edc2df894917b34...@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis --- kernel/vhost_task.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c index 48c289947b99..375356499867 100644 --- a/kernel/vhost_task.c +++ b/kernel/vhost_task.c @@ -20,10 +20,10 @@ struct vhost_task { struct completion exited; unsigned long flags; struct task_struct *task; - /* serialize SIGKILL and vhost_task_stop calls */ - struct mutex exit_mutex; }; +static DEFINE_MUTEX(exit_mutex); //serialize SIGKILL and vhost_task_stop calls + static int vhost_task_fn(void *data) { struct vhost_task *vtsk = data; @@ -51,7 +51,7 @@ static int vhost_task_fn(void *data) schedule(); } - mutex_lock(&vtsk->exit_mutex); + mutex_lock(&exit_mutex); /* * If a vhost_task_stop and SIGKILL race, we can ignore the SIGKILL. * When the vhost layer has called vhost_task_stop it's already stopped @@ -62,7 +62,7 @@ static int vhost_task_fn(void *data) vtsk->handle_sigkill(vtsk->data); } complete(&vtsk->exited); - mutex_unlock(&vtsk->exit_mutex); + mutex_unlock(&exit_mutex); do_exit(0); } @@ -88,12 +88,12 @@ EXPORT_SYMBOL_GPL(vhost_task_wake); */ void vhost_task_stop(struct vhost_task *vtsk) { - mutex_lock(&vtsk->exit_mutex); + mutex_lock(&exit_mutex); if (!test_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags)) { se