回复: 回复: Question on KASAN calltrace record in RT
发件人: Mike Galbraith 发送时间: 2021年4月14日 15:56 收件人: Zhang, Qiang; Dmitry Vyukov 抄送: Andrew Halaney; andreyk...@gmail.com; ryabinin@gmail.com; a...@linux-foundation.org; linux-kernel@vger.kernel.org; kasan-...@googlegroups.com 主题: Re: 回复: Question on KASAN calltrace record in RT [Please note: This e-mail is from an EXTERNAL e-mail address] On Wed, 2021-04-14 at 07:29 +, Zhang, Qiang wrote: > > if CONFIG_PREEMPT_RT is enabled and but not in preemptible, the prealloc > should be allowed > >No, you can't take an rtmutex when not preemptible. > Oh, I'm in a mess, Thank you for your explanation. >-Mike
回复: Question on KASAN calltrace record in RT
发件人: Mike Galbraith 发送时间: 2021年4月14日 12:00 收件人: Dmitry Vyukov; Zhang, Qiang 抄送: Andrew Halaney; andreyk...@gmail.com; ryabinin@gmail.com; a...@linux-foundation.org; linux-kernel@vger.kernel.org; kasan-...@googlegroups.com 主题: Re: Question on KASAN calltrace record in RT [Please note: This e-mail is from an EXTERNAL e-mail address] On Tue, 2021-04-13 at 17:29 +0200, Dmitry Vyukov wrote: > On Tue, Apr 6, 2021 at 10:26 AM Zhang, Qiang > wrote: > > > > Hello everyone > > > > In RT system, after Andrew test, found the following calltrace , > > in KASAN, we record callstack through stack_depot_save(), in this function, > > may be call alloc_pages, but in RT, the spin_lock replace with > > rt_mutex in alloc_pages(), if before call this function, the irq is > > disabled, > > will trigger following calltrace. > > > > maybe add array[KASAN_STACK_DEPTH] in struct kasan_track to record > > callstack in RT system. > > > > Is there a better solution ? > > Hi Qiang, > > Adding 2 full stacks per heap object can increase memory usage too much. > The stackdepot has a preallocation mechanism, I would start with > adding interrupts check here: > https://elixir.bootlin.com/linux/v5.12-rc7/source/lib/stackdepot.c#L294 > and just not do preallocation in interrupt context. This will solve > the problem, right? Hm, this thing might actually be (sorta?) working, modulo one startup gripe. The CRASH_DUMP inspired gripe I get with !RT appeared (and shut up when told I don't care given kdump has worked just fine for ages:), but no more might_sleep() gripeage. CONFIG_KASAN_SHADOW_OFFSET=0xdc00 CONFIG_HAVE_ARCH_KASAN=y CONFIG_HAVE_ARCH_KASAN_VMALLOC=y CONFIG_CC_HAS_KASAN_GENERIC=y CONFIG_KASAN=y CONFIG_KASAN_GENERIC=y CONFIG_KASAN_OUTLINE=y # CONFIG_KASAN_INLINE is not set CONFIG_KASAN_STACK=1 CONFIG_KASAN_VMALLOC=y # CONFIG_KASAN_MODULE_TEST is not set --- lib/stackdepot.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -71,7 +71,7 @@ static void *stack_slabs[STACK_ALLOC_MAX static int depot_index; static int next_slab_inited; static size_t depot_offset; -static DEFINE_SPINLOCK(depot_lock); +static DEFINE_RAW_SPINLOCK(depot_lock); static bool init_stack_slab(void **prealloc) { @@ -265,7 +265,7 @@ depot_stack_handle_t stack_depot_save(un struct page *page = NULL; void *prealloc = NULL; unsigned long flags; - u32 hash; + u32 hash, preemptible = !IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible(); if CONFIG_PREEMPT_RT is enabled and but not in preemptible, the prealloc should be allowed should be change like this: may_prealloc = !(IS_ENABLED(CONFIG_PREEMPT_RT) && preemptible()); Thanks Qiang if (unlikely(nr_entries == 0) || stack_depot_disable) goto fast_exit; @@ -291,7 +291,7 @@ depot_stack_handle_t stack_depot_save(un * The smp_load_acquire() here pairs with smp_store_release() to * |next_slab_inited| in depot_alloc_stack() and init_stack_slab(). */ - if (unlikely(!smp_load_acquire(_slab_inited))) { + if (unlikely(!smp_load_acquire(_slab_inited) && may_prealloc)) { /* * Zero out zone modifiers, as we don't have specific zone * requirements. Keep the flags related to allocation in atomic @@ -305,7 +305,7 @@ depot_stack_handle_t stack_depot_save(un prealloc = page_address(page); } - spin_lock_irqsave(_lock, flags); + raw_spin_lock_irqsave(_lock, flags); found = find_stack(*bucket, entries, nr_entries, hash); if (!found) { @@ -329,7 +329,7 @@ depot_stack_handle_t stack_depot_save(un WARN_ON(!init_stack_slab()); } - spin_unlock_irqrestore(_lock, flags); + raw_spin_unlock_irqrestore(_lock, flags); exit: if (prealloc) { /* Nobody used this memory, ok to free it. */ [0.692437] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:943 [0.692439] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 [0.692442] Preemption disabled at: [0.692443] [] on_each_cpu_cond_mask+0x30/0xb0 [0.692451] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.0.g2afefec-tip-rt #5 [0.692454] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [0.692456] Call Trace: [0.692458] ? on_each_cpu_cond_mask+0x30/0xb0 [0.692462] dump_stack+0x8a/0xb5 [0.692467] ___might_sleep.cold+0xfe/0x112 [0.692471] rt_spin_lock+0x1c/0x60 [0.692475] free_unref_page+0x117/0x3c0 [0.692481] qlist_free_all+0x60/0xd0 [0.692485] per_cpu_remove_cache+0x5b/0x70 [0.692488] smp_call_function_many_cond+0x185/
回复: Question on KASAN calltrace record in RT
发件人: Dmitry Vyukov 发送时间: 2021年4月13日 23:29 收件人: Zhang, Qiang 抄送: Andrew Halaney; andreyk...@gmail.com; ryabinin@gmail.com; a...@linux-foundation.org; linux-kernel@vger.kernel.org; kasan-...@googlegroups.com 主题: Re: Question on KASAN calltrace record in RT [Please note: This e-mail is from an EXTERNAL e-mail address] On Tue, Apr 6, 2021 at 10:26 AM Zhang, Qiang wrote: > > Hello everyone > > In RT system, after Andrew test, found the following calltrace , > in KASAN, we record callstack through stack_depot_save(), in this function, > may be call alloc_pages, but in RT, the spin_lock replace with > rt_mutex in alloc_pages(), if before call this function, the irq is disabled, > will trigger following calltrace. > > maybe add array[KASAN_STACK_DEPTH] in struct kasan_track to record callstack > in RT system. > > Is there a better solution ? >Hi Qiang, > >Adding 2 full stacks per heap object can increase memory usage too >much. >The stackdepot has a preallocation mechanism, I would start with >adding interrupts check here: >https://elixir.bootlin.com/linux/v5.12-rc7/source/lib/stackdepot.c#L294 >and just not do preallocation in interrupt context. This will solve >the problem, right? It seems to be useful, however, there are the following situations If there is a lot of stack information that needs to be saved in interrupts, the memory which has been allocated to hold the stack information is depletion, when need to save stack again in interrupts, there will be no memory available . Thanks Qiang > Thanks > Qiang > > BUG: sleeping function called from invalid context at > kernel/locking/rtmutex.c:951 > [ 14.522262] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 640, > name: mount > [ 14.522304] Call Trace: > [ 14.522306] dump_stack+0x92/0xc1 > [ 14.522313] ___might_sleep.cold.99+0x1b0/0x1ef > [ 14.522319] rt_spin_lock+0x3e/0xc0 > [ 14.522329] local_lock_acquire+0x52/0x3c0 > [ 14.522332] get_page_from_freelist+0x176c/0x3fd0 > [ 14.522543] __alloc_pages_nodemask+0x28f/0x7f0 > [ 14.522559] stack_depot_save+0x3a1/0x470 > [ 14.522564] kasan_save_stack+0x2f/0x40 > [ 14.523575] kasan_record_aux_stack+0xa3/0xb0 > [ 14.523580] insert_work+0x48/0x340 > [ 14.523589] __queue_work+0x430/0x1280 > [ 14.523595] mod_delayed_work_on+0x98/0xf0 > [ 14.523607] kblockd_mod_delayed_work_on+0x17/0x20 > [ 14.523611] blk_mq_run_hw_queue+0x151/0x2b0 > [ 14.523620] blk_mq_sched_insert_request+0x2ad/0x470 > [ 14.523633] blk_mq_submit_bio+0xd2a/0x2330 > [ 14.523675] submit_bio_noacct+0x8aa/0xfe0 > [ 14.523693] submit_bio+0xf0/0x550 > [ 14.523714] submit_bio_wait+0xfe/0x200 > [ 14.523724] xfs_rw_bdev+0x370/0x480 [xfs] > [ 14.523831] xlog_do_io+0x155/0x320 [xfs] > [ 14.524032] xlog_bread+0x23/0xb0 [xfs] > [ 14.524133] xlog_find_head+0x131/0x8b0 [xfs] > [ 14.524375] xlog_find_tail+0xc8/0x7b0 [xfs] > [ 14.524828] xfs_log_mount+0x379/0x660 [xfs] > [ 14.524927] xfs_mountfs+0xc93/0x1af0 [xfs] > [ 14.525424] xfs_fs_fill_super+0x923/0x17f0 [xfs] > [ 14.525522] get_tree_bdev+0x404/0x680 > [ 14.525622] vfs_get_tree+0x89/0x2d0 > [ 14.525628] path_mount+0xeb2/0x19d0 > [ 14.525648] do_mount+0xcb/0xf0 > [ 14.525665] __x64_sys_mount+0x162/0x1b0 > [ 14.525670] do_syscall_64+0x33/0x40 > [ 14.525674] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 14.525677] RIP: 0033:0x7fd6c15eaade
Question on KASAN calltrace record in RT
Hello everyone In RT system, after Andrew test, found the following calltrace , in KASAN, we record callstack through stack_depot_save(), in this function, may be call alloc_pages, but in RT, the spin_lock replace with rt_mutex in alloc_pages(), if before call this function, the irq is disabled, will trigger following calltrace. maybe add array[KASAN_STACK_DEPTH] in struct kasan_track to record callstack in RT system. Is there a better solution ? Thanks Qiang BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:951 [ 14.522262] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 640, name: mount [ 14.522304] Call Trace: [ 14.522306] dump_stack+0x92/0xc1 [ 14.522313] ___might_sleep.cold.99+0x1b0/0x1ef [ 14.522319] rt_spin_lock+0x3e/0xc0 [ 14.522329] local_lock_acquire+0x52/0x3c0 [ 14.522332] get_page_from_freelist+0x176c/0x3fd0 [ 14.522543] __alloc_pages_nodemask+0x28f/0x7f0 [ 14.522559] stack_depot_save+0x3a1/0x470 [ 14.522564] kasan_save_stack+0x2f/0x40 [ 14.523575] kasan_record_aux_stack+0xa3/0xb0 [ 14.523580] insert_work+0x48/0x340 [ 14.523589] __queue_work+0x430/0x1280 [ 14.523595] mod_delayed_work_on+0x98/0xf0 [ 14.523607] kblockd_mod_delayed_work_on+0x17/0x20 [ 14.523611] blk_mq_run_hw_queue+0x151/0x2b0 [ 14.523620] blk_mq_sched_insert_request+0x2ad/0x470 [ 14.523633] blk_mq_submit_bio+0xd2a/0x2330 [ 14.523675] submit_bio_noacct+0x8aa/0xfe0 [ 14.523693] submit_bio+0xf0/0x550 [ 14.523714] submit_bio_wait+0xfe/0x200 [ 14.523724] xfs_rw_bdev+0x370/0x480 [xfs] [ 14.523831] xlog_do_io+0x155/0x320 [xfs] [ 14.524032] xlog_bread+0x23/0xb0 [xfs] [ 14.524133] xlog_find_head+0x131/0x8b0 [xfs] [ 14.524375] xlog_find_tail+0xc8/0x7b0 [xfs] [ 14.524828] xfs_log_mount+0x379/0x660 [xfs] [ 14.524927] xfs_mountfs+0xc93/0x1af0 [xfs] [ 14.525424] xfs_fs_fill_super+0x923/0x17f0 [xfs] [ 14.525522] get_tree_bdev+0x404/0x680 [ 14.525622] vfs_get_tree+0x89/0x2d0 [ 14.525628] path_mount+0xeb2/0x19d0 [ 14.525648] do_mount+0xcb/0xf0 [ 14.525665] __x64_sys_mount+0x162/0x1b0 [ 14.525670] do_syscall_64+0x33/0x40 [ 14.525674] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 14.525677] RIP: 0033:0x7fd6c15eaade
回复: [PATCH v2] loop: call __loop_clr_fd() with lo_mutex locked to avoid autoclear race
发件人: Pavel Tatashin 发送时间: 2021年3月27日 5:41 收件人: Zhang, Qiang 抄送: Jens Axboe; linux-bl...@vger.kernel.org; LKML 主题: Re: [PATCH v2] loop: call __loop_clr_fd() with lo_mutex locked to avoid autoclear race [Please note: This e-mail is from an EXTERNAL e-mail address] On Fri, Mar 26, 2021 at 5:00 AM wrote: > > From: Zqiang > > lo->lo_refcnt = 0 > > CPU0 CPU1 > lo_open()lo_open() > mutex_lock(>lo_mutex) > atomic_inc(>lo_refcnt) > lo_refcnt == 1 > mutex_unlock(>lo_mutex) > mutex_lock(>lo_mutex) > atomic_inc(>lo_refcnt) > lo_refcnt == 2 > mutex_unlock(>lo_mutex) > loop_clr_fd() > mutex_lock(>lo_mutex) > atomic_read(>lo_refcnt) > 1 > lo->lo_flags |= LO_FLAGS_AUTOCLEARlo_release() > mutex_unlock(>lo_mutex) > return mutex_lock(>lo_mutex) >atomic_dec_return(>lo_refcnt) > lo_refcnt == 1 > mutex_unlock(>lo_mutex) > return > > lo_release() > mutex_lock(>lo_mutex) > atomic_dec_return(>lo_refcnt) > lo_refcnt == 0 > lo->lo_flags & LO_FLAGS_AUTOCLEAR > == true > mutex_unlock(>lo_mutex) loop_control_ioctl() >case LOOP_CTL_REMOVE: > mutex_lock(>lo_mutex) > atomic_read(>lo_refcnt)==0 > __loop_clr_fd(lo, true) mutex_unlock(>lo_mutex) > mutex_lock(>lo_mutex)loop_remove(lo) >mutex_destroy(>lo_mutex) > .. kfree(lo) >data race > > When different tasks on two CPUs perform the above operations on the same > lo device, data race may be occur, Do not drop lo->lo_mutex before calling > __loop_clr_fd(), so refcnt and LO_FLAGS_AUTOCLEAR check in lo_release > stay in sync. > >There is a race with autoclear logic where use after free may >occur as >shown in the above scenario. Do not drop lo->lo_mutex >before calling >__loop_clr_fd(), so refcnt and LO_FLAGS_AUTOCLEAR check >in lo_release >stay in sync. Hi Pasha this patch is incorrect, lo->lo_state status detection is ignored by me, in lo_release() the lo_state is set Lo_rundown, when call LOOP_CTL_REMOVE , if lo->lo_state != Lo_unbound will directly return, not call loop_remove(). I'm sorry to mislead you. > >Reviewed-by: Pavel Tatashin > > Fixes: 6cc8e7430801 ("loop: scale loop device by introducing per device lock") > Signed-off-by: Zqiang > --- > v1->v2: > Modify the title and commit message. > > drivers/block/loop.c | 11 --- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index d58d68f3c7cd..5712f1698a66 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1201,7 +1201,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool > release) > bool partscan = false; > int lo_number; > > - mutex_lock(>lo_mutex); > if (WARN_ON_ONCE(lo->lo_state != Lo_rundown)) { > err = -ENXIO; > goto out_unlock; > @@ -1257,7 +1256,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool > release) > lo_number = lo->lo_number; > loop_unprepare_queue(lo); > out_unlock: > - mutex_unlock(>lo_mutex); > if (partscan) { > /* > * bd_mutex has been held already in release path, so don't > @@ -1288,12 +1286,11 @@ static int __loop_clr_fd(struct loop_device *lo, bool > release) > * protects us from all the other places trying to change the 'lo' > * device. > */ > - mutex_lock(>lo_mutex); > + > lo->lo_flags = 0; > if (!part_shift) > lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN; > lo->lo_state = Lo_unbound; > - mutex_unlock(>lo_mutex); > > /* > * Need not hold lo_mutex to fput backing file. Calling fput holding > @@ -1332,9 +1329,10 @@ static int loop_clr_fd(struct loop_device *lo) > return 0; > } >
回复: [PATCH] loop: Fix use of unsafe lo->lo_mutex locks
发件人: Pavel Tatashin 发送时间: 2021年3月25日 21:09 收件人: Zhang, Qiang 抄送: Jens Axboe; linux-bl...@vger.kernel.org; LKML 主题: Re: [PATCH] loop: Fix use of unsafe lo->lo_mutex locks [Please note: This e-mail is from an EXTERNAL e-mail address] >Hi Qiang, > >Thank you for root causing this issue. Did you encounter this issue >or >found by inspection? > >I would change the title to what actually being changed, something >like: > >loop: call __loop_clr_fd() with lo_mutex locked to avoid autoclear >race > > > .. kfree(lo) >UAF > > When different tasks on two CPUs perform the above operations on the same > lo device, UAF may occur. > >Please also explain the fix: > >Do not drop lo->lo_mutex before calling __loop_clr_fd(), so refcnt >and >LO_FLAGS_AUTOCLEAR check in lo_release stay in sync. Sorry Pasha, please Ignore I sent v2 patch. In lo_release() , we set lo->lo_state = Lo_rundown In loop_control_ioctl(), LOOP_CTL_REMOVE: if (lo->lo_state != Lo_unbound) is true will return, not call loop_remove(). I'm sorry to mislead you. Thanks Qiang > > Fixes: 6cc8e7430801 ("loop: scale loop device by introducing per device lock") > Signed-off-by: Zqiang > --- > drivers/block/loop.c | 11 --- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index d58d68f3c7cd..5712f1698a66 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1201,7 +1201,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool > release) > bool partscan = false; > int lo_number; > > - mutex_lock(>lo_mutex); > if (WARN_ON_ONCE(lo->lo_state != Lo_rundown)) { > err = -ENXIO; > goto out_unlock; > @@ -1257,7 +1256,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool > release) > lo_number = lo->lo_number; > loop_unprepare_queue(lo); > out_unlock: > - mutex_unlock(>lo_mutex); > if (partscan) { > /* > * bd_mutex has been held already in release path, so don't > @@ -1288,12 +1286,11 @@ static int __loop_clr_fd(struct loop_device *lo, bool > release) > * protects us from all the other places trying to change the 'lo' > * device. > */ > - mutex_lock(>lo_mutex); > + > lo->lo_flags = 0; > if (!part_shift) > lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN; > lo->lo_state = Lo_unbound; > - mutex_unlock(>lo_mutex); > > /* > * Need not hold lo_mutex to fput backing file. Calling fput holding > @@ -1332,9 +1329,10 @@ static int loop_clr_fd(struct loop_device *lo) > return 0; > } > lo->lo_state = Lo_rundown; > + err = __loop_clr_fd(lo, false); > mutex_unlock(>lo_mutex); > > - return __loop_clr_fd(lo, false); > + return err; > } > > static int > @@ -1916,13 +1914,12 @@ static void lo_release(struct gendisk *disk, fmode_t > mode) > if (lo->lo_state != Lo_bound) > goto out_unlock; > lo->lo_state = Lo_rundown; > - mutex_unlock(>lo_mutex); > /* > * In autoclear mode, stop the loop thread > * and remove configuration after last close. > */ > __loop_clr_fd(lo, true); > - return; > + goto out_unlock; > } else if (lo->lo_state == Lo_bound) { > /* > * Otherwise keep thread (if running) and config, > -- > 2.17.1 > >LGTM >Reviewed-by: Pavel Tatashin > >Thank you, >Pasha
回复: [PATCH] loop: Fix use of unsafe lo->lo_mutex locks
发件人: Pavel Tatashin 发送时间: 2021年3月25日 21:09 收件人: Zhang, Qiang 抄送: Jens Axboe; linux-bl...@vger.kernel.org; LKML 主题: Re: [PATCH] loop: Fix use of unsafe lo->lo_mutex locks [Please note: This e-mail is from an EXTERNAL e-mail address] >Hi Qiang, > >Thank you for root causing this issue. Did you encounter this issue >or >found by inspection? Hi Pasha I found the problem during the inspection > >I would change the title to what actually being changed, something >like: > >loop: call __loop_clr_fd() with lo_mutex locked to avoid autoclear >race agree it > > > .. kfree(lo) >UAF > > When different tasks on two CPUs perform the above operations on the same > lo device, UAF may occur. > >Please also explain the fix: > >Do not drop lo->lo_mutex before calling __loop_clr_fd(), so refcnt >and >LO_FLAGS_AUTOCLEAR check in lo_release stay in sync. I will modify it and resend it Thanks Qiang > > > Fixes: 6cc8e7430801 ("loop: scale loop device by introducing per device lock") > Signed-off-by: Zqiang > --- > drivers/block/loop.c | 11 --- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index d58d68f3c7cd..5712f1698a66 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1201,7 +1201,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool > release) > bool partscan = false; > int lo_number; > > - mutex_lock(>lo_mutex); > if (WARN_ON_ONCE(lo->lo_state != Lo_rundown)) { > err = -ENXIO; > goto out_unlock; > @@ -1257,7 +1256,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool > release) > lo_number = lo->lo_number; > loop_unprepare_queue(lo); > out_unlock: > - mutex_unlock(>lo_mutex); > if (partscan) { > /* > * bd_mutex has been held already in release path, so don't > @@ -1288,12 +1286,11 @@ static int __loop_clr_fd(struct loop_device *lo, bool > release) > * protects us from all the other places trying to change the 'lo' > * device. > */ > - mutex_lock(>lo_mutex); > + > lo->lo_flags = 0; > if (!part_shift) > lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN; > lo->lo_state = Lo_unbound; > - mutex_unlock(>lo_mutex); > > /* > * Need not hold lo_mutex to fput backing file. Calling fput holding > @@ -1332,9 +1329,10 @@ static int loop_clr_fd(struct loop_device *lo) > return 0; > } > lo->lo_state = Lo_rundown; > + err = __loop_clr_fd(lo, false); > mutex_unlock(>lo_mutex); > > - return __loop_clr_fd(lo, false); > + return err; > } > > static int > @@ -1916,13 +1914,12 @@ static void lo_release(struct gendisk *disk, fmode_t > mode) > if (lo->lo_state != Lo_bound) > goto out_unlock; > lo->lo_state = Lo_rundown; > - mutex_unlock(>lo_mutex); > /* > * In autoclear mode, stop the loop thread > * and remove configuration after last close. > */ > __loop_clr_fd(lo, true); > - return; > + goto out_unlock; > } else if (lo->lo_state == Lo_bound) { > /* > * Otherwise keep thread (if running) and config, > -- > 2.17.1 > > >LGTM >Reviewed-by: Pavel Tatashin > >Thank you, >Pasha
回复: [PATCH v2] bpf: Fix memory leak in copy_process()
Hello Alexei Starovoitov Daniel Borkmann Please review this patch. Thanks Qiang 发件人: Zhang, Qiang 发送时间: 2021年3月15日 16:53 收件人: a...@kernel.org; dan...@iogearbox.net; and...@kernel.org 抄送: dvyu...@google.com; linux-kernel@vger.kernel.org; syzbot+44908bb56d2bfe56b...@syzkaller.appspotmail.com; b...@vger.kernel.org; Zhang, Qiang 主题: [PATCH v2] bpf: Fix memory leak in copy_process() From: Zqiang The syzbot report a memleak follow: BUG: memory leak unreferenced object 0x888101b41d00 (size 120): comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s) backtrace: [] alloc_pid+0x66/0x560 [] copy_process+0x1465/0x25e0 [] kernel_clone+0xf3/0x670 [] kernel_thread+0x61/0x80 [] call_usermodehelper_exec_work [] call_usermodehelper_exec_work+0xc4/0x120 [] process_one_work+0x2c9/0x600 [] worker_thread+0x59/0x5d0 [] kthread+0x178/0x1b0 [] ret_from_fork+0x1f/0x30 unreferenced object 0x888110ef5c00 (size 232): comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s) backtrace: [] kmem_cache_zalloc [] __alloc_file+0x1f/0xf0 [] alloc_empty_file+0x69/0x120 [] alloc_file+0x33/0x1b0 [] alloc_file_pseudo+0xb2/0x140 [] create_pipe_files+0x138/0x2e0 [] umd_setup+0x33/0x220 [] call_usermodehelper_exec_async+0xb4/0x1b0 [] ret_from_fork+0x1f/0x30 after the UMD process exits, the pipe_to_umh/pipe_from_umh and tgid need to be release. Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that populates bpffs.") Reported-by: syzbot+44908bb56d2bfe56b...@syzkaller.appspotmail.com Signed-off-by: Zqiang --- v1->v2: Judge whether the pointer variable tgid is valid. kernel/bpf/preload/bpf_preload_kern.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/preload/bpf_preload_kern.c b/kernel/bpf/preload/bpf_preload_kern.c index 79c5772465f1..5009875f01d3 100644 --- a/kernel/bpf/preload/bpf_preload_kern.c +++ b/kernel/bpf/preload/bpf_preload_kern.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include "bpf_preload.h" @@ -20,6 +21,14 @@ static struct bpf_preload_ops umd_ops = { .owner = THIS_MODULE, }; +static void bpf_preload_umh_cleanup(struct umd_info *info) +{ + fput(info->pipe_to_umh); + fput(info->pipe_from_umh); + put_pid(info->tgid); + info->tgid = NULL; +} + static int preload(struct bpf_preload_info *obj) { int magic = BPF_PRELOAD_START; @@ -61,8 +70,10 @@ static int finish(void) if (n != sizeof(magic)) return -EPIPE; tgid = umd_ops.info.tgid; - wait_event(tgid->wait_pidfd, thread_group_exited(tgid)); - umd_ops.info.tgid = NULL; + if (tgid) { + wait_event(tgid->wait_pidfd, thread_group_exited(tgid)); + bpf_preload_umh_cleanup(_ops.info); + } return 0; } @@ -80,10 +91,15 @@ static int __init load_umd(void) static void __exit fini_umd(void) { + struct pid *tgid; bpf_preload_ops = NULL; /* kill UMD in case it's still there due to earlier error */ - kill_pid(umd_ops.info.tgid, SIGKILL, 1); - umd_ops.info.tgid = NULL; + tgid = umd_ops.info.tgid; + if (tgid) { + kill_pid(tgid, SIGKILL, 1); + wait_event(tgid->wait_pidfd, thread_group_exited(tgid)); + bpf_preload_umh_cleanup(_ops.info); + } umd_unload_blob(_ops.info); } late_initcall(load_umd); -- 2.17.1
回复: [PATCH] ARM: Fix incorrect use of smp_processor_id() by syzbot report
发件人: Dmitry Vyukov 发送时间: 2021年3月12日 14:30 收件人: Zhang, Qiang 抄送: Russell King - ARM Linux; Andrew Morton; LKML; Linux ARM; syzkaller-bugs 主题: Re: [PATCH] ARM: Fix incorrect use of smp_processor_id() by syzbot report [Please note: This e-mail is from an EXTERNAL e-mail address] On Fri, Mar 12, 2021 at 5:13 AM wrote: > > From: Zqiang > > BUG: using smp_processor_id() in preemptible [] code: > syz-executor.0/15841 > caller is debug_smp_processor_id+0x20/0x24 > lib/smp_processor_id.c:64 > > The smp_processor_id() is used in a code segment when > preemption has been disabled, otherwise, when preemption > is enabled this pointer is usually no longer useful > since it may no longer point to per cpu data of the > current processor. > > Reported-by: syzbot > Fixes: f5fe12b1eaee ("ARM: spectre-v2: harden user aborts in kernel space") > Signed-off-by: Zqiang > --- > arch/arm/include/asm/system_misc.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/arm/include/asm/system_misc.h > b/arch/arm/include/asm/system_misc.h > index 66f6a3ae68d2..61916dc7d361 100644 > --- a/arch/arm/include/asm/system_misc.h > +++ b/arch/arm/include/asm/system_misc.h > @@ -21,8 +21,10 @@ typedef void (*harden_branch_predictor_fn_t)(void); > DECLARE_PER_CPU(harden_branch_predictor_fn_t, harden_branch_predictor_fn); > static inline void harden_branch_predictor(void) > { > + preempt_disable(); > harden_branch_predictor_fn_t fn = per_cpu(harden_branch_predictor_fn, > smp_processor_id()); > + preempt_enable(); > if (fn) > fn(); > } >Hi Qiang, > >If the CPU can change here, what if it changes right after >preempt_enable()? >Disabling preemption just around reading the callback looks like a >no-op. Shouldn't we disable preemption at least around reading and >calling the callback? Hi dvyukov Oh, I'm confused, we should call preempt_enable after calling callback function, to make sure callback function is called on current processor . thank you for your remind. > >On the second look, the fn seems to be const after init, so maybe we >need to use raw_smp_processor_id() instead with an explanatory >comment?
回复: possible deadlock in io_poll_double_wake (2)
发件人: Zhang, Qiang 发送时间: 2021年3月3日 20:15 收件人: Jens Axboe; syzbot; asml.sile...@gmail.com; io-ur...@vger.kernel.org; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; syzkaller-b...@googlegroups.com; v...@zeniv.linux.org.uk 主题: 回复: possible deadlock in io_poll_double_wake (2) 发件人: Jens Axboe 发送时间: 2021年3月3日 1:20 收件人: syzbot; asml.sile...@gmail.com; io-ur...@vger.kernel.org; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; syzkaller-b...@googlegroups.com; v...@zeniv.linux.org.uk 主题: Re: possible deadlock in io_poll_double_wake (2) [Please note: This e-mail is from an EXTERNAL e-mail address] On 2/28/21 9:18 PM, syzbot wrote: > Hello, > > syzbot has tested the proposed patch but the reproducer is still triggering > an issue: > possible deadlock in io_poll_double_wake > > > WARNING: possible recursive locking detected > 5.11.0-syzkaller #0 Not tainted > > syz-executor.0/10241 is trying to acquire lock: > 888012e09130 (>sleep){..-.}-{2:2}, at: spin_lock > include/linux/spinlock.h:354 [inline] > 888012e09130 (>sleep){..-.}-{2:2}, at: > io_poll_double_wake+0x25f/0x6a0 fs/io_uring.c:4921 > > but task is already holding lock: > 888013b00130 (>sleep){..-.}-{2:2}, at: > __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137 > > other info that might help us debug this: > Possible unsafe locking scenario: > >CPU0 > > lock(>sleep); > lock(>sleep); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > >Since the fix is in yet this keeps failing (and I didn't get it), >I looked >closer at this report. While the names of the locks are the >same, they are >really two different locks. So let's try this... >Hello Jens Axboe Sorry for I make noise, please ignore this information. >Sorry, I provided the wrong information before. >I'm not very familiar with io_uring, before we start >vfs_poll again, should >we set 'poll->head = NULL' ? > >diff --git a/fs/io_uring.c b/fs/io_uring.c >index 42b675939582..cae605c14510 100644 >--- a/fs/io_uring.c >+++ b/fs/io_uring.c >@@ -4824,7 +4824,7 @@ static bool io_poll_rewait(struct >io_kiocb *req, struct >io_poll_iocb *poll) > >if (!req->result && !READ_ONCE(poll->canceled)) { >struct poll_table_struct pt = { ._key = poll->events >}; >- >+ poll->head = NULL; >req->result = vfs_poll(req->file, ) & >poll->events; >} >Thanks >Qiang > >#syz test: git://git.kernel.dk/linux-block syzbot-test > >-- >Jens Axboe
回复: possible deadlock in io_poll_double_wake (2)
发件人: Jens Axboe 发送时间: 2021年3月3日 1:20 收件人: syzbot; asml.sile...@gmail.com; io-ur...@vger.kernel.org; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; syzkaller-b...@googlegroups.com; v...@zeniv.linux.org.uk 主题: Re: possible deadlock in io_poll_double_wake (2) [Please note: This e-mail is from an EXTERNAL e-mail address] On 2/28/21 9:18 PM, syzbot wrote: > Hello, > > syzbot has tested the proposed patch but the reproducer is still triggering > an issue: > possible deadlock in io_poll_double_wake > > > WARNING: possible recursive locking detected > 5.11.0-syzkaller #0 Not tainted > > syz-executor.0/10241 is trying to acquire lock: > 888012e09130 (>sleep){..-.}-{2:2}, at: spin_lock > include/linux/spinlock.h:354 [inline] > 888012e09130 (>sleep){..-.}-{2:2}, at: > io_poll_double_wake+0x25f/0x6a0 fs/io_uring.c:4921 > > but task is already holding lock: > 888013b00130 (>sleep){..-.}-{2:2}, at: > __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137 > > other info that might help us debug this: > Possible unsafe locking scenario: > >CPU0 > > lock(>sleep); > lock(>sleep); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > >Since the fix is in yet this keeps failing (and I didn't get it), >I looked >closer at this report. While the names of the locks are the >same, they are >really two different locks. So let's try this... Hello Jens Axboe Sorry, I provided the wrong information before. I'm not very familiar with io_uring, before we start vfs_poll again, should we set 'poll->head = NULL' ? diff --git a/fs/io_uring.c b/fs/io_uring.c index 42b675939582..cae605c14510 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4824,7 +4824,7 @@ static bool io_poll_rewait(struct io_kiocb *req, struct io_poll_iocb *poll) if (!req->result && !READ_ONCE(poll->canceled)) { struct poll_table_struct pt = { ._key = poll->events }; - + poll->head = NULL; req->result = vfs_poll(req->file, ) & poll->events; } Thanks Qiang > >#syz test: git://git.kernel.dk/linux-block syzbot-test > >-- >Jens Axboe
回复: [PATCH v2] workqueue: Move the position of debug_work_activate() in __queue_work()
Hello Tejun Please review this change. Thanks Qiang > >发件人: Zhang, Qiang >发送时间: 2021年2月18日 11:17 >收件人: jiangshan...@gmail.com; t...@kernel.org >抄送: linux-kernel@vger.kernel.org >主题: [PATCH v2] workqueue: Move the position of >debug_work_activate() in >__queue_work() > >From: Zqiang > >The debug_work_activate() is called on the premise that >the work can be inserted, because if wq be in WQ_DRAINING >status, insert work may be failed. > >Fixes: e41e704bc4f4 ("workqueue: improve destroy_workqueue() >debuggability") >Signed-off-by: Zqiang >Reviewed-by: Lai Jiangshan >--- >v1->v2: > add Fixes tag. > > kernel/workqueue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/kernel/workqueue.c b/kernel/workqueue.c >index 0d150da252e8..21fb00b52def 100644 >--- a/kernel/workqueue.c >+++ b/kernel/workqueue.c >@@ -1412,7 +1412,6 @@ static void __queue_work(int cpu, struct >>workqueue_struct *wq, > */ >lockdep_assert_irqs_disabled(); > >- debug_work_activate(work); > >/* if draining, only works from the same workqueue are allowed >*/ > if (unlikely(wq->flags & __WQ_DRAINING) && >@@ -1494,6 +1493,7 @@ static void __queue_work(int cpu, struct >>workqueue_struct *wq, >worklist = >delayed_works; >} > >+ debug_work_activate(work); >insert_work(pwq, work, worklist, work_flags); > > out: >-- >2.25.1
回复: possible deadlock in io_poll_double_wake (2)
发件人: Jens Axboe 发送时间: 2021年3月1日 7:08 收件人: syzbot; asml.sile...@gmail.com; io-ur...@vger.kernel.org; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; syzkaller-b...@googlegroups.com; v...@zeniv.linux.org.uk 主题: Re: possible deadlock in io_poll_double_wake (2) [Please note: This e-mail is from an EXTERNAL e-mail address] On 2/27/21 5:42 PM, syzbot wrote: > syzbot has found a reproducer for the following issue on: > > HEAD commit:5695e516 Merge tag 'io_uring-worker.v3-2021-02-25' of git:.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=114e3866d0 > kernel config: https://syzkaller.appspot.com/x/.config?x=8c76dad0946df1f3 > dashboard link: https://syzkaller.appspot.com/bug?extid=28abd693db9e92c160d8 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=122ed9b6d0 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=14d5a292d0 > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+28abd693db9e92c16...@syzkaller.appspotmail.com > > > WARNING: possible recursive locking detected > 5.11.0-syzkaller #0 Not tainted > > swapper/1/0 is trying to acquire lock: > 88801b2b1130 (>sleep){..-.}-{2:2}, at: spin_lock > include/linux/spinlock.h:354 [inline] > 88801b2b1130 (>sleep){..-.}-{2:2}, at: > io_poll_double_wake+0x25f/0x6a0 fs/io_uring.c:4960 > > but task is already holding lock: > 88801b2b3130 (>sleep){..-.}-{2:2}, at: > __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137 > > other info that might help us debug this: > Possible unsafe locking scenario: > >CPU0 > > lock(>sleep); > lock(>sleep); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by swapper/1/0: > #0: 888147474908 (>lock){..-.}-{2:2}, at: > _snd_pcm_stream_lock_irqsave+0x9f/0xd0 sound/core/pcm_native.c:170 > #1: 88801b2b3130 (>sleep){..-.}-{2:2}, at: > __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137 > > stack backtrace: > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.11.0-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > > __dump_stack lib/dump_stack.c:79 [inline] > dump_stack+0xfa/0x151 lib/dump_stack.c:120 > print_deadlock_bug kernel/locking/lockdep.c:2829 [inline] > check_deadlock kernel/locking/lockdep.c:2872 [inline] > validate_chain kernel/locking/lockdep.c:3661 [inline] > __lock_acquire.cold+0x14c/0x3b4 kernel/locking/lockdep.c:4900 > lock_acquire kernel/locking/lockdep.c:5510 [inline] > lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475 > __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] > _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151 > spin_lock include/linux/spinlock.h:354 [inline] > io_poll_double_wake+0x25f/0x6a0 fs/io_uring.c:4960 > __wake_up_common+0x147/0x650 kernel/sched/wait.c:108 > __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:138 > snd_pcm_update_state+0x46a/0x540 sound/core/pcm_lib.c:203 > snd_pcm_update_hw_ptr0+0xa75/0x1a50 sound/core/pcm_lib.c:464 > snd_pcm_period_elapsed+0x160/0x250 sound/core/pcm_lib.c:1805 > dummy_hrtimer_callback+0x94/0x1b0 sound/drivers/dummy.c:378 > __run_hrtimer kernel/time/hrtimer.c:1519 [inline] > __hrtimer_run_queues+0x609/0xe40 kernel/time/hrtimer.c:1583 > hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1600 > __do_softirq+0x29b/0x9f6 kernel/softirq.c:345 > invoke_softirq kernel/softirq.c:221 [inline] > __irq_exit_rcu kernel/softirq.c:422 [inline] > irq_exit_rcu+0x134/0x200 kernel/softirq.c:434 > sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100 > > asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632 > RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline] > RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:70 [inline] > RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:137 [inline] > RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline] > RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516 > Code: dd 38 6e f8 84 db 75 ac e8 54 32 6e f8 e8 0f 1c 74 f8 e9 0c 00 00 00 e8 > 45 32 6e f8 0f 00 2d 4e 4a c5 00 e8 39 32 6e f8 fb f4 <9c> 5b 81 e3 00 02 00 > 00 fa 31 ff 48 89 de e8 14 3a 6e f8 48 85 db > RSP: 0018:c9d47d18 EFLAGS: 0293 > RAX: RBX: RCX: > RDX: 8880115c3780 RSI: 89052537 RDI: > RBP: 888141127064 R08: 0001 R09: 0001 > R10: 81794168 R11: R12: 0001 > R13: 888141127000 R14: 888141127064 R15: 888143331804 > acpi_idle_enter+0x361/0x500 drivers/acpi/processor_idle.c:647 > cpuidle_enter_state+0x1b1/0xc80
回复: [PATCH] workqueue: Remove rcu_read_lock/unlock() in workqueue_congested()
发件人: Paul E. McKenney 发送时间: 2021年2月18日 23:17 收件人: Lai Jiangshan 抄送: Zhang, Qiang; Tejun Heo; Tejun Heo; LKML 主题: Re: [PATCH] workqueue: Remove rcu_read_lock/unlock() in workqueue_congested() [Please note: This e-mail is from an EXTERNAL e-mail address] On Thu, Feb 18, 2021 at 11:04:00AM +0800, Lai Jiangshan wrote: > +CC Paul > > > On Wed, Feb 17, 2021 at 7:58 PM wrote: > > > > From: Zqiang > > > > The RCU read critical area already by preempt_disable/enable() > > (equivalent to rcu_read_lock_sched/unlock_sched()) mark, so remove > > rcu_read_lock/unlock(). > > I think we can leave it which acks like document, especially > workqueue_congested() is not performance crucial. Either way > is Ok for me. > >If the rcu_read_lock() is removed, should there be a comment saying >that >it interacts with synchronize_rcu()? Just in case one of the real-time >guys figures out a way to get the job done without disabling >preemption... > >Thanx, Paul > > If it needs to be changed, please also do the same for > rcu_read_lock() in wq_watchdog_timer_fn(). > > > And __queue_work() and try_to_grab_pending() also use local_irq_save() > and rcu_read_lock() at the same time, but I don't know will these > local_irq_save() be changed to raw_local_irq_save() in PREEMPT_RT. The local_irq_save function is not change in PREEMPT_RT system. Thanks Qiang > > > > > > Signed-off-by: Zqiang > > --- > > kernel/workqueue.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > > index 0d150da252e8..c599835ad6c3 100644 > > --- a/kernel/workqueue.c > > +++ b/kernel/workqueue.c > > @@ -4540,7 +4540,6 @@ bool workqueue_congested(int cpu, struct > > workqueue_struct *wq) > > struct pool_workqueue *pwq; > > bool ret; > > > > - rcu_read_lock(); > > preempt_disable(); > > > > if (cpu == WORK_CPU_UNBOUND) > > @@ -4553,7 +4552,6 @@ bool workqueue_congested(int cpu, struct > > workqueue_struct *wq) > > > > ret = !list_empty(>delayed_works); > > preempt_enable(); > > - rcu_read_unlock(); > > > > return ret; > > } > > -- > > 2.25.1 > >
回复: [PATCH] workqueue: Move the position of debug_work_activate() in __queue_work()
Hello Tejun Heo Excuse me, do you have time to make some suggestions for this modification? Thanks Qiang 发件人: Zhang, Qiang 发送时间: 2021年2月11日 16:24 收件人: t...@kernel.org; jiangshan...@gmail.com 抄送: linux-kernel@vger.kernel.org 主题: [PATCH] workqueue: Move the position of debug_work_activate() in __queue_work() From: Zqiang The debug_work_activate() is called on the premise that the work can be inserted, because if wq be in WQ_DRAINING status, insert work may be failed. Signed-off-by: Zqiang --- kernel/workqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 0d150da252e8..21fb00b52def 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1412,7 +1412,6 @@ static void __queue_work(int cpu, struct workqueue_struct *wq, */ lockdep_assert_irqs_disabled(); - debug_work_activate(work); /* if draining, only works from the same workqueue are allowed */ if (unlikely(wq->flags & __WQ_DRAINING) && @@ -1494,6 +1493,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq, worklist = >delayed_works; } + debug_work_activate(work); insert_work(pwq, work, worklist, work_flags); out: -- 2.25.1
回复: 回复: [PATCH v3] kvfree_rcu: Release page cache under memory pressure
发件人: Uladzislau Rezki 发送时间: 2021年2月4日 22:09 收件人: Zhang, Qiang 抄送: Uladzislau Rezki; paul...@kernel.org; j...@joelfernandes.org; r...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: 回复: [PATCH v3] kvfree_rcu: Release page cache under memory pressure [Please note: This e-mail is from an EXTERNAL e-mail address] > 发件人: Uladzislau Rezki > 发送时间: 2021年2月2日 3:57 > 收件人: Zhang, Qiang > 抄送: ure...@gmail.com; paul...@kernel.org; j...@joelfernandes.org; > r...@vger.kernel.org; linux-kernel@vger.kernel.org > 主题: Re: [PATCH v3] kvfree_rcu: Release page cache under memory pressure > > [Please note: This e-mail is from an EXTERNAL e-mail address] > > Hello, Zqiang. > > > From: Zqiang > > > > Add free per-cpu existing krcp's page cache operation, when > > the system is under memory pressure. > > > > Signed-off-by: Zqiang > > --- > > kernel/rcu/tree.c | 26 ++ > > 1 file changed, 26 insertions(+) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index c1ae1e52f638..644b0f3c7b9f 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -3571,17 +3571,41 @@ void kvfree_call_rcu(struct rcu_head *head, > > rcu_callback_t func) > > } > > EXPORT_SYMBOL_GPL(kvfree_call_rcu); > > > > +static int free_krc_page_cache(struct kfree_rcu_cpu *krcp) > > +{ > > + unsigned long flags; > > + struct llist_node *page_list, *pos, *n; > > + int freed = 0; > > + > > + raw_spin_lock_irqsave(>lock, flags); > > + page_list = llist_del_all(>bkvcache); > > + krcp->nr_bkv_objs = 0; > > + raw_spin_unlock_irqrestore(>lock, flags); > > + > > + llist_for_each_safe(pos, n, page_list) { > > + free_page((unsigned long)pos); > > + freed++; > > + } > > + > > + return freed; > > +} > > + > > static unsigned long > > kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) > > { > > int cpu; > > unsigned long count = 0; > > + unsigned long flags; > > > > /* Snapshot count of all CPUs */ > > for_each_possible_cpu(cpu) { > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu); > > > > count += READ_ONCE(krcp->count); > > + > > + raw_spin_lock_irqsave(>lock, flags); > > + count += krcp->nr_bkv_objs; > > + raw_spin_unlock_irqrestore(>lock, flags); > > } > > > > return count; > > @@ -3598,6 +3622,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct > > shrink_control *sc) > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu); > > > > count = krcp->count; > > + count += free_krc_page_cache(krcp); > > + > > raw_spin_lock_irqsave(>lock, flags); > > if (krcp->monitor_todo) > > kfree_rcu_drain_unlock(krcp, flags); > > -- > > 2.17.1 > >> > >Thank you for your patch! > > > >I spent some time to see how the patch behaves under low memory condition. > >To simulate it, i used "rcuscale" tool with below parameters: > > > >../rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 10 > >--kconfig >CONFIG_NR_CPUS=64 \ > >--bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 > >>rcuscale.holdoff=20 rcuscale.kfree_loops=1 \ > >torture.disable_onoff_at_boot" --trust-make > > > >64 CPUs + 512 MB of memory. In general, my test system was running on edge > >hitting an out of memory sometimes, but could be considered as stable in > >regards to a test completion and taken time, so both were pretty solid. > > > >You can find a comparison on a plot, that can be downloaded following > >a link: wget > >>ftp://vps418301.ovh.net/incoming/release_page_cache_under_low_memory.png > > > >In short, i see that a patched version can lead to longer test completion, > >whereas the default variant is stable on almost all runs. After some analysis > >and further digging i came to conclusion that a shrinker > >free_krc_page_cache() > >concurs with run_page_cache_worker(krcp) running from kvfree_rcu() context. > > > >i.e. During the test a page shrinker is pretty active, because of low memory > >condition. Our callback drains it whereas kvfree_rcu() part refill it right > >away making kind of vicious circle. > > > >
回复: [PATCH] uprobes: Fix kasan UAF reported by syzbot
Hello peterz This ("rbtree, uprobes: Use rbtree helpers")modification misses the increase in the reference count , syzbot have been reporting recently . Thanks Qiang 发件人: Zhang, Qiang 发送时间: 2021年2月2日 17:17 收件人: pet...@infradead.org; mi...@redhat.com; syzbot+2f6d683983e3905ad...@syzkaller.appspotmail.com 抄送: o...@redhat.com; linux-kernel@vger.kernel.org 主题: [PATCH] uprobes: Fix kasan UAF reported by syzbot From: Zqiang Call Trace: __dump_stack [inline] dump_stack+0x107/0x163 print_address_description.constprop.0.cold+0x5b/0x2f8 __kasan_report [inline] kasan_report.cold+0x7c/0xd8 uprobe_cmp [inline] __uprobe_cmp [inline] rb_find_add [inline] __insert_uprobe [inline] insert_uprobe [inline] alloc_uprobe [inline] __uprobe_register+0x70f/0x850 .. __do_sys_perf_event_open+0x647/0x2e60 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Allocated by task 12710: kzalloc [inline] alloc_uprobe [inline] __uprobe_register+0x19c/0x850 trace_uprobe_enable [inline] trace_uprobe_register+0x443/0x880 ... __do_sys_perf_event_open+0x647/0x2e60 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 12710: kfree+0xe5/0x7b0 put_uprobe [inline] put_uprobe+0x13b/0x190 uprobe_apply+0xfc/0x130 uprobe_perf_open [inline] trace_uprobe_register+0x5c9/0x880 ... __do_sys_perf_event_open+0x647/0x2e60 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 fix the count of references lost in __find_uprobe function Fixes: c6bc9bd06dff ("rbtree, uprobes: Use rbtree helpers") Reported-by: syzbot+1182ffb2063c5d087...@syzkaller.appspotmail.com Signed-off-by: Zqiang --- kernel/events/uprobes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 7e15b2efdd87..6addc9780319 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -661,7 +661,7 @@ static struct uprobe *__find_uprobe(struct inode *inode, loff_t offset) struct rb_node *node = rb_find(, _tree, __uprobe_cmp_key); if (node) - return __node_2_uprobe(node); + return get_uprobe(__node_2_uprobe(node)); return NULL; } -- 2.17.1
回复: [PATCH v3] kvfree_rcu: Release page cache under memory pressure
发件人: Uladzislau Rezki 发送时间: 2021年2月2日 3:57 收件人: Zhang, Qiang 抄送: ure...@gmail.com; paul...@kernel.org; j...@joelfernandes.org; r...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH v3] kvfree_rcu: Release page cache under memory pressure [Please note: This e-mail is from an EXTERNAL e-mail address] Hello, Zqiang. > From: Zqiang > > Add free per-cpu existing krcp's page cache operation, when > the system is under memory pressure. > > Signed-off-by: Zqiang > --- > kernel/rcu/tree.c | 26 ++ > 1 file changed, 26 insertions(+) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index c1ae1e52f638..644b0f3c7b9f 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3571,17 +3571,41 @@ void kvfree_call_rcu(struct rcu_head *head, > rcu_callback_t func) > } > EXPORT_SYMBOL_GPL(kvfree_call_rcu); > > +static int free_krc_page_cache(struct kfree_rcu_cpu *krcp) > +{ > + unsigned long flags; > + struct llist_node *page_list, *pos, *n; > + int freed = 0; > + > + raw_spin_lock_irqsave(>lock, flags); > + page_list = llist_del_all(>bkvcache); > + krcp->nr_bkv_objs = 0; > + raw_spin_unlock_irqrestore(>lock, flags); > + > + llist_for_each_safe(pos, n, page_list) { > + free_page((unsigned long)pos); > + freed++; > + } > + > + return freed; > +} > + > static unsigned long > kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) > { > int cpu; > unsigned long count = 0; > + unsigned long flags; > > /* Snapshot count of all CPUs */ > for_each_possible_cpu(cpu) { > struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu); > > count += READ_ONCE(krcp->count); > + > + raw_spin_lock_irqsave(>lock, flags); > + count += krcp->nr_bkv_objs; > + raw_spin_unlock_irqrestore(>lock, flags); > } > > return count; > @@ -3598,6 +3622,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct > shrink_control *sc) > struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu); > > count = krcp->count; > + count += free_krc_page_cache(krcp); > + > raw_spin_lock_irqsave(>lock, flags); > if (krcp->monitor_todo) > kfree_rcu_drain_unlock(krcp, flags); > -- > 2.17.1 >> >Thank you for your patch! > >I spent some time to see how the patch behaves under low memory condition. >To simulate it, i used "rcuscale" tool with below parameters: > >../rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 10 --kconfig >>CONFIG_NR_CPUS=64 \ >--bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 >>rcuscale.holdoff=20 rcuscale.kfree_loops=1 \ >torture.disable_onoff_at_boot" --trust-make > >64 CPUs + 512 MB of memory. In general, my test system was running on edge >hitting an out of memory sometimes, but could be considered as stable in >regards to a test completion and taken time, so both were pretty solid. > >You can find a comparison on a plot, that can be downloaded following >a link: wget >>ftp://vps418301.ovh.net/incoming/release_page_cache_under_low_memory.png > >In short, i see that a patched version can lead to longer test completion, >whereas the default variant is stable on almost all runs. After some analysis >and further digging i came to conclusion that a shrinker free_krc_page_cache() >concurs with run_page_cache_worker(krcp) running from kvfree_rcu() context. > >i.e. During the test a page shrinker is pretty active, because of low memory >condition. Our callback drains it whereas kvfree_rcu() part refill it right >away making kind of vicious circle. > >So, a run_page_cache_worker() should be backoff for some time when a system >runs into a low memory condition or high pressure: > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >index 7077d73fcb53..446723b9646b 100644 >--- a/kernel/rcu/tree.c >+++ b/kernel/rcu/tree.c >@@ -3163,7 +3163,7 @@ struct kfree_rcu_cpu { >bool initialized; >int count; > >- struct work_struct page_cache_work; >+ struct delayed_work page_cache_work; >atomic_t work_in_progress; >struct hrtimer hrtimer; > >@@ -3419,7 +3419,7 @@ schedule_page_work_fn(struct hrtimer *t) >struct kfree_rcu_cpu *krcp = >container_of(t, struct kfree_rcu_cpu, hrtimer); > >- queue_work(system_highpri_wq, >page_cache_work); >+ queue_delayed_work(system_highpri_wq, &
回复: [PATCH v3] kvfree_rcu: Release page cache under memory pressure
发件人: Uladzislau Rezki 发送时间: 2021年2月2日 3:57 收件人: Zhang, Qiang 抄送: ure...@gmail.com; paul...@kernel.org; j...@joelfernandes.org; r...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH v3] kvfree_rcu: Release page cache under memory pressure [Please note: This e-mail is from an EXTERNAL e-mail address] Hello, Zqiang. > From: Zqiang > > Add free per-cpu existing krcp's page cache operation, when > the system is under memory pressure. > > Signed-off-by: Zqiang > --- > kernel/rcu/tree.c | 26 ++ > 1 file changed, 26 insertions(+) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index c1ae1e52f638..644b0f3c7b9f 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3571,17 +3571,41 @@ void kvfree_call_rcu(struct rcu_head *head, > rcu_callback_t func) > } > EXPORT_SYMBOL_GPL(kvfree_call_rcu); > > +static int free_krc_page_cache(struct kfree_rcu_cpu *krcp) > +{ > + unsigned long flags; > + struct llist_node *page_list, *pos, *n; > + int freed = 0; > + > + raw_spin_lock_irqsave(>lock, flags); > + page_list = llist_del_all(>bkvcache); > + krcp->nr_bkv_objs = 0; > + raw_spin_unlock_irqrestore(>lock, flags); > + > + llist_for_each_safe(pos, n, page_list) { > + free_page((unsigned long)pos); > + freed++; > + } > + > + return freed; > +} > + > static unsigned long > kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) > { > int cpu; > unsigned long count = 0; > + unsigned long flags; > > /* Snapshot count of all CPUs */ > for_each_possible_cpu(cpu) { > struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu); > > count += READ_ONCE(krcp->count); > + > + raw_spin_lock_irqsave(>lock, flags); > + count += krcp->nr_bkv_objs; > + raw_spin_unlock_irqrestore(>lock, flags); > } > > return count; > @@ -3598,6 +3622,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct > shrink_control *sc) > struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu); > > count = krcp->count; > + count += free_krc_page_cache(krcp); > + > raw_spin_lock_irqsave(>lock, flags); > if (krcp->monitor_todo) > kfree_rcu_drain_unlock(krcp, flags); > -- > 2.17.1 >> >Thank you for your patch! > >I spent some time to see how the patch behaves under low memory condition. >To simulate it, i used "rcuscale" tool with below parameters: > >../rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 10 --kconfig >>CONFIG_NR_CPUS=64 \ >--bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 >>rcuscale.holdoff=20 rcuscale.kfree_loops=1 \ >torture.disable_onoff_at_boot" --trust-make > >64 CPUs + 512 MB of memory. In general, my test system was running on edge >hitting an out of memory sometimes, but could be considered as stable in >regards to a test completion and taken time, so both were pretty solid. > >You can find a comparison on a plot, that can be downloaded following >a link: wget >>ftp://vps418301.ovh.net/incoming/release_page_cache_under_low_memory.png > >In short, i see that a patched version can lead to longer test completion, >whereas the default variant is stable on almost all runs. After some analysis >and further digging i came to conclusion that a shrinker free_krc_page_cache() >concurs with run_page_cache_worker(krcp) running from kvfree_rcu() context. > >i.e. During the test a page shrinker is pretty active, because of low memory >condition. Our callback drains it whereas kvfree_rcu() part refill it right >away making kind of vicious circle. > >So, a run_page_cache_worker() should be backoff for some time when a system >runs into a low memory condition or high pressure: > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >index 7077d73fcb53..446723b9646b 100644 >--- a/kernel/rcu/tree.c >+++ b/kernel/rcu/tree.c >@@ -3163,7 +3163,7 @@ struct kfree_rcu_cpu { >bool initialized; >int count; > >- struct work_struct page_cache_work; >+ struct delayed_work page_cache_work; >atomic_t work_in_progress; >struct hrtimer hrtimer; > >@@ -3419,7 +3419,7 @@ schedule_page_work_fn(struct hrtimer *t) >struct kfree_rcu_cpu *krcp = >container_of(t, struct kfree_rcu_cpu, hrtimer); > >- queue_work(system_highpri_wq, >page_cache_work); >+ queue_delayed_work(system_highpri_wq, &
回复: [PATCH v3] kvfree_rcu: Release page cache under memory pressure
发件人: Uladzislau Rezki 发送时间: 2021年2月2日 3:57 收件人: Zhang, Qiang 抄送: ure...@gmail.com; paul...@kernel.org; j...@joelfernandes.org; r...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH v3] kvfree_rcu: Release page cache under memory pressure [Please note: This e-mail is from an EXTERNAL e-mail address] Hello, Zqiang. > From: Zqiang > > Add free per-cpu existing krcp's page cache operation, when > the system is under memory pressure. > > Signed-off-by: Zqiang > --- > kernel/rcu/tree.c | 26 ++ > 1 file changed, 26 insertions(+) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index c1ae1e52f638..644b0f3c7b9f 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3571,17 +3571,41 @@ void kvfree_call_rcu(struct rcu_head *head, > rcu_callback_t func) > } > EXPORT_SYMBOL_GPL(kvfree_call_rcu); > > +static int free_krc_page_cache(struct kfree_rcu_cpu *krcp) > +{ > + unsigned long flags; > + struct llist_node *page_list, *pos, *n; > + int freed = 0; > + > + raw_spin_lock_irqsave(>lock, flags); > + page_list = llist_del_all(>bkvcache); > + krcp->nr_bkv_objs = 0; > + raw_spin_unlock_irqrestore(>lock, flags); > + > + llist_for_each_safe(pos, n, page_list) { > + free_page((unsigned long)pos); > + freed++; > + } > + > + return freed; > +} > + > static unsigned long > kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) > { > int cpu; > unsigned long count = 0; > + unsigned long flags; > > /* Snapshot count of all CPUs */ > for_each_possible_cpu(cpu) { > struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu); > > count += READ_ONCE(krcp->count); > + > + raw_spin_lock_irqsave(>lock, flags); > + count += krcp->nr_bkv_objs; > + raw_spin_unlock_irqrestore(>lock, flags); > } > > return count; > @@ -3598,6 +3622,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct > shrink_control *sc) > struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu); > > count = krcp->count; > + count += free_krc_page_cache(krcp); > + > raw_spin_lock_irqsave(>lock, flags); > if (krcp->monitor_todo) > kfree_rcu_drain_unlock(krcp, flags); > -- > 2.17.1 >> >Thank you for your patch! > >I spent some time to see how the patch behaves under low memory condition. >To simulate it, i used "rcuscale" tool with below parameters: > >../rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 10 --kconfig >>CONFIG_NR_CPUS=64 \ >--bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 >>rcuscale.holdoff=20 rcuscale.kfree_loops=1 \ >torture.disable_onoff_at_boot" --trust-make > >64 CPUs + 512 MB of memory. In general, my test system was running on edge >hitting an out of memory sometimes, but could be considered as stable in >regards to a test completion and taken time, so both were pretty solid. > >You can find a comparison on a plot, that can be downloaded following >a link: wget >>ftp://vps418301.ovh.net/incoming/release_page_cache_under_low_memory.png > >In short, i see that a patched version can lead to longer test completion, >whereas the default variant is stable on almost all runs. After some analysis >and further digging i came to conclusion that a shrinker free_krc_page_cache() >concurs with run_page_cache_worker(krcp) running from kvfree_rcu() context. > >i.e. During the test a page shrinker is pretty active, because of low memory >condition. Our callback drains it whereas kvfree_rcu() part refill it right >away making kind of vicious circle. > >So, a run_page_cache_worker() should be backoff for some time when a system >runs into a low memory condition or high pressure: > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >index 7077d73fcb53..446723b9646b 100644 >--- a/kernel/rcu/tree.c >+++ b/kernel/rcu/tree.c >@@ -3163,7 +3163,7 @@ struct kfree_rcu_cpu { >bool initialized; >int count; > >- struct work_struct page_cache_work; >+ struct delayed_work page_cache_work; >atomic_t work_in_progress; >struct hrtimer hrtimer; > >@@ -3419,7 +3419,7 @@ schedule_page_work_fn(struct hrtimer *t) >struct kfree_rcu_cpu *krcp = >container_of(t, struct kfree_rcu_cpu, hrtimer); > >- queue_work(system_highpri_wq, >page_cache_work); >+ queue_delayed_work(system_highpri_wq, &
回复: [PATCH v2] kvfree_rcu: Release page cache under memory pressure
发件人: Uladzislau Rezki 发送时间: 2021年1月29日 22:19 收件人: Zhang, Qiang 抄送: ure...@gmail.com; paul...@kernel.org; j...@joelfernandes.org; r...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH v2] kvfree_rcu: Release page cache under memory pressure [Please note: This e-mail is from an EXTERNAL e-mail address] On Fri, Jan 29, 2021 at 04:04:42PM +0800, qiang.zh...@windriver.com wrote: > From: Zqiang > > Add free per-cpu existing krcp's page cache operation, when > the system is under memory pressure. > > Signed-off-by: Zqiang > --- > kernel/rcu/tree.c | 25 + > 1 file changed, 25 insertions(+) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index c1ae1e52f638..ec098910d80b 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3571,17 +3571,40 @@ void kvfree_call_rcu(struct rcu_head *head, > rcu_callback_t func) > } > EXPORT_SYMBOL_GPL(kvfree_call_rcu); > > +static int free_krc_page_cache(struct kfree_rcu_cpu *krcp) > +{ > + unsigned long flags; > + struct kvfree_rcu_bulk_data *bnode; > + int i; > + > + for (i = 0; i < rcu_min_cached_objs; i++) { > + raw_spin_lock_irqsave(>lock, flags); >I am not sure why we should disable IRQs. I think it can be >avoided. Suppose in multi CPU system, the kfree_rcu_shrink_scan function is runing on CPU2, and we just traverse to CPU2, and then call free_krc_page_cache function, if not disable irq, a interrupt may be occurs on CPU2 after the CPU2 corresponds to krcp variable 's lock be acquired, if the interrupt or softirq handler function to call kvfree_rcu function, in this function , acquire CPU2 corresponds to krcp variable 's lock , will happen deadlock. Or in single CPU scenario. > + bnode = get_cached_bnode(krcp); > + raw_spin_unlock_irqrestore(>lock, flags); > + if (!bnode) > + break; > + free_page((unsigned long)bnode); > + } > + > + return i; > +} >Also i forgot to add in my previous comment to this path. Can we >access >to page cache once and then do the drain work? I mean if we had >100 objects >in the cache we would need to access to a krcp->lock 100 times. > >What about something like below: > > >static int free_krc_page_cache(struct kfree_rcu_cpu *krcp) >{ >struct llist_node *page_list, *pos, *n; >int freed = 0; > >raw_spin_lock(>lock); >page_list = llist_del_all(>bkvcache); >krcp->nr_bkv_objs = 0; >raw_spin_unlock(>lock); > >llist_for_each_safe(pos, n, page_list) { >free_page((unsigned long) pos); >freed++; >} > >return freed; >} > this change looks better. Thanks Qiang > + > static unsigned long > kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) > { > int cpu; > unsigned long count = 0; > + unsigned long flags; > > /* Snapshot count of all CPUs */ > for_each_possible_cpu(cpu) { > struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu); > > count += READ_ONCE(krcp->count); > + > + raw_spin_lock_irqsave(>lock, flags); > + count += krcp->nr_bkv_objs; > + raw_spin_unlock_irqrestore(>lock, flags); >Should we disable irqs? > > return count; > @@ -3598,6 +3621,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct > shrink_control *sc) > struct kfree_rcu_cpu *krcp = per_cpu_ptr(, cpu); > > count = krcp->count; > + count += free_krc_page_cache(krcp); > + > raw_spin_lock_irqsave(>lock, flags); > if (krcp->monitor_todo) > kfree_rcu_drain_unlock(krcp, flags); > -- > 2.17.1 Thanks! -- Vlad Rezki
回复: [PATCH] PM: remove PF_WQ_WORKER mask
发件人: Rafael J. Wysocki 发送时间: 2021年1月28日 2:16 收件人: Zhang, Qiang 抄送: Rafael Wysocki; Linux PM; Linux Kernel Mailing List 主题: Re: [PATCH] PM: remove PF_WQ_WORKER mask [Please note: This e-mail is from an EXTERNAL e-mail address] On Mon, Jan 25, 2021 at 5:01 AM wrote: > > From: Zqiang > > Due to kworker also is kernel thread, it's already included > PF_KTHREAD mask, so remove PF_WQ_WORKER mask. >So you are saying that all threads having PF_WQ_WORKER set must also >have PF_KTHREAD set, right? yes #define PF_KTHREAD 0x0020 #define PF_WQ_WORKER 0x0020 I tracing kwoker's task->flags as follows: comm kworker/1:0, cpu 1, task->flags 0x4208060, delayed 3, func intel_fbc_work_fn Thanks Qiang >That sounds correct, so I'm going to rewrite the changelog and apply >the patch as 5.12 material, thanks! > Signed-off-by: Zqiang > --- > kernel/power/process.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/power/process.c b/kernel/power/process.c > index 45b054b7b5ec..50cc63534486 100644 > --- a/kernel/power/process.c > +++ b/kernel/power/process.c > @@ -235,7 +235,7 @@ void thaw_kernel_threads(void) > > read_lock(_lock); > for_each_process_thread(g, p) { > - if (p->flags & (PF_KTHREAD | PF_WQ_WORKER)) > + if (p->flags & PF_KTHREAD) > __thaw_task(p); > } > read_unlock(_lock); > -- > 2.17.1 >
回复: 回复: 回复: [PATCH] rcu: Release per-cpu krcp page cache when CPU going offline
发件人: Uladzislau Rezki 发送时间: 2021年1月26日 22:07 收件人: Zhang, Qiang 抄送: Uladzislau Rezki; Paul E. McKenney; r...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: 回复: 回复: [PATCH] rcu: Release per-cpu krcp page cache when CPU going offline > > On Fri, Jan 22, 2021 at 01:44:36AM +, Zhang, Qiang wrote: > > > > > > > > 发件人: Uladzislau Rezki > > 发送时间: 2021年1月22日 4:26 > > 收件人: Zhang, Qiang > > 抄送: Paul E. McKenney; r...@vger.kernel.org; linux-kernel@vger.kernel.org; > > ure...@gmail.com > > 主题: Re: [PATCH] rcu: Release per-cpu krcp page cache when CPU going offline > > >Hello, Qiang, > > > > > On Thu, Jan 21, 2021 at 02:49:49PM +0800, qiang.zh...@windriver.com wrote: > > > > From: Zqiang > > > > > > > > If CPUs go offline, the corresponding krcp's page cache can > > > > not be use util the CPU come back online, or maybe the CPU > > > > will never go online again, this commit therefore free krcp's > > > > page cache when CPUs go offline. > > > > > > > > Signed-off-by: Zqiang > > > > > >Do you consider it as an issue? We have 5 pages per CPU, that is 20480 > > >bytes. > > > > > > > Hello Rezki > > > > In a multi CPUs system, more than one CPUs may be offline, there are more > > than 5 pages, and these offline CPUs may never go online again or in the > > process of CPUs online, there are errors, which lead to the failure of > > online, these scenarios will lead to the per-cpu krc page cache will never > > be released. > > > >Thanks for your answer. I was thinking more about if you knew some >platforms > >which suffer from such extra page usage when CPU goes offline. Any >issues > >your platforms or devices run into because of that. > > > >So i understand that if CPU goes offline the 5 pages associated with it >are > >unused until it goes online back. > > I agree with you, But I still want to talk about what I think > > My understanding is that when the CPU is offline, the pages is not > accessible, beacuse we don't know when this CPU will > go online again, so we best to return these page to the buddy system, > when the CPU goes online again, we can allocate page from the buddy > system to fill krcp's page cache. maybe you may think that this memory > is small and don't need to. > >BTW, we can release the caches via shrinker path instead, what is more makes >sense to me. We already have a callback, that frees pages when a page allocator >asks for it. I think in that case it would be fair to return it to the buddy >system. It happens under low memory condition I agree. it can be done in shrink callback, can release the currently existing per-cpu page cache. Thanks Qiang > or can be done manually to flush >system caches: > >echo 3 > /proc/sys/vm/drop_caches > >What do you think? > >-- >Vlad Rezki
回复: [PATCH] PM: remove PF_WQ_WORKER mask
发件人: Zhang, Qiang 发送时间: 2021年1月25日 12:00 收件人: rafael.j.wyso...@intel.com 抄送: linux...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: [PATCH] PM: remove PF_WQ_WORKER mask From: Zqiang Due to kworker also is kernel thread, it's already included PF_KTHREAD mask, so remove PF_WQ_WORKER mask. Signed-off-by: Zqiang --- kernel/power/process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/power/process.c b/kernel/power/process.c index 45b054b7b5ec..50cc63534486 100644 --- a/kernel/power/process.c +++ b/kernel/power/process.c @@ -235,7 +235,7 @@ void thaw_kernel_threads(void) read_lock(_lock); for_each_process_thread(g, p) { - if (p->flags & (PF_KTHREAD | PF_WQ_WORKER)) + if (p->flags & PF_KTHREAD) __thaw_task(p); } read_unlock(_lock); -- 2.17.1
回复: [PATCH] sched/core: add rcu_read_lock/unlock() protection
发件人: Zhang, Qiang 发送时间: 2021年1月26日 16:29 收件人: valentin.schnei...@arm.com 抄送: pet...@infradead.org; linux-kernel@vger.kernel.org 主题: [PATCH] sched/core: add rcu_read_lock/unlock() protection >From: Zqiang >Due to for_each_process_thread belongs to RCU read operation, >need to add rcu_read_lock/unlock() protection. Sorry to disturb you I find it's already in the RCU critical zone Please ignore this change >Signed-off-by: Zqiang >--- > kernel/sched/core.c | 2 ++ > 1 file changed, 2 insertions(+) > >diff --git a/kernel/sched/core.c b/kernel/sched/core.c >index 8c5481077c9c..c3f0103fdf53 100644 >--- a/kernel/sched/core.c >+++ b/kernel/sched/core.c >@@ -7738,6 +7738,7 @@ static void dump_rq_tasks(struct rq *rq, const char >*loglvl) >lockdep_assert_held(>lock); > >printk("%sCPU%d enqueued tasks (%u total):\n", loglvl, cpu, > rq->nr_running); >+ rcu_read_lock(); >for_each_process_thread(g, p) { >if (task_cpu(p) != cpu) >continue; >@@ -7747,6 +7748,7 @@ static void dump_rq_tasks(struct rq *rq, const char >*loglvl) > >printk("%s\tpid: %d, name: %s\n", loglvl, p->pid, p->comm); >} >+ rcu_read_unlock(); > } > > int sched_cpu_dying(unsigned int cpu) -- 2.17.1
回复: 回复: 回复: [PATCH 3/3] kvfree_rcu: use migrate_disable/enable()
发件人: Uladzislau Rezki 发送时间: 2021年1月25日 21:49 收件人: Zhang, Qiang 抄送: Uladzislau Rezki; LKML; RCU; Paul E . McKenney; Michael Ellerman; Andrew Morton; Daniel Axtens; Frederic Weisbecker; Neeraj Upadhyay; Joel Fernandes; Peter Zijlstra; Michal Hocko; Thomas Gleixner; Theodore Y . Ts'o; Sebastian Andrzej Siewior; Oleksiy Avramchenko 主题: Re: 回复: 回复: [PATCH 3/3] kvfree_rcu: use migrate_disable/enable() > >Hello, Zhang. > > > > > > >发件人: Uladzislau Rezki (Sony) > > >发送时间: 2021年1月21日 0:21 > > >收件人: LKML; RCU; Paul E . McKenney; Michael Ellerman > > >抄送: Andrew Morton; Daniel Axtens; Frederic Weisbecker; Neeraj >Upadhyay; > > >Joel Fernandes; Peter Zijlstra; Michal Hocko; Thomas >Gleixner; Theodore Y > > >. Ts'o; Sebastian Andrzej Siewior; Uladzislau >Rezki; Oleksiy Avramchenko > > >主题: [PATCH 3/3] kvfree_rcu: use migrate_disable/enable() > > > > > >Since the page is obtained in a fully preemptible context, dropping > > >the lock can lead to migration onto another CPU. As a result a prev. > > >bnode of that CPU may be underutilised, because a decision has been > > >made for a CPU that was run out of free slots to store a pointer. > > > > > >migrate_disable/enable() are now independent of RT, use it in order > > >to prevent any migration during a page request for a specific CPU it > > >is requested for. > > > > > > Hello Rezki > > > > The critical migrate_disable/enable() area is not allowed to block, under > > RT and non RT. > > There is such a description in preempt.h > > > > > > * Notes on the implementation. > > * > > * The implementation is particularly tricky since existing code patterns > > * dictate neither migrate_disable() nor migrate_enable() is allowed to > > block. > > * This means that it cannot use cpus_read_lock() to serialize against > > hotplug, > > * nor can it easily migrate itself into a pending affinity mask change on > > * migrate_enable(). > > > >How i interpret it is migrate_enable()/migrate_disable() are not allowed to > >use any blocking primitives, such as rwsem/mutexes/etc. in order to mark a > >current context as non-migratable. > > > >void migrate_disable(void) > >{ > > struct task_struct *p = current; > > > > if (p->migration_disabled) { > > p->migration_disabled++; > > return; > > } > > > preempt_disable(); > > this_rq()->nr_pinned++; > > p->migration_disabled = 1; > > preempt_enable(); > >} > > > >It does nothing that prevents you from doing schedule() or even wait for any > >event(mutex slow path behaviour), when the process is removed from the > >run-queue. > >I mean after the migrate_disable() is invoked. Or i miss something? > > Hello Rezki > > Sorry, there's something wrong with the previous description. > There are the following scenarios > > Due to migrate_disable will increase this_rq()->nr_pinned , after that > if get_free_page be blocked, and this time, CPU going offline, > the sched_cpu_wait_empty() be called in per-cpu "cpuhp/%d" task, > and be blocked. > >But after the migrate_disable() is invoked a CPU can not be brought down. >If there are pinned tasks a "hotplug path" will be blocked on >balance_hotplug_wait() >call. > blocked: > sched_cpu_wait_empty() > { > struct rq *rq = this_rq(); >rcuwait_wait_event(>hotplug_wait, >rq->nr_running == 1 && !rq_has_pinned_tasks(rq), >TASK_UNINTERRUPTIBLE); > } > >Exactly. > wakeup: > balance_push() > { > if (is_per_cpu_kthread(push_task) || > is_migration_disabled(push_task)) { > > if (!rq->nr_running && !rq_has_pinned_tasks(rq) && > rcuwait_active(>hotplug_wait)) { > raw_spin_unlock(>lock); > rcuwait_wake_up(>hotplug_wait); > raw_spin_lock(>lock); > } > return; > } > } > > One of the conditions for this function to wake up is "rq->nr_pinned == 0" > that is to say between migrate_disable/enable, if blocked will defect CPU > going > offline longer blocking time. > >Indeed, the hotplug time is affected. For example in case of waiting for >a mutex to be released, an owner will wakeup waiters. But this is expectable. > > I'm not sur
Question on migrate_disabe/enable()
Hello Peterz, tglx I have some questions about migrate_disabe/enable(), in the past migrate_disabe/enable() is replaced by preempt_disable/enable() in no RT system. And now migrate_disabe/enable() has its own implementation, I want to know in migrate_disabe/enable() critical area is blocking allowed? If allowed, There's a blockage in migrate_disabe/enable() critical area, and this time if CPU going offline,offline time will be longer. Is this normal phenomenon? Thanks Qiang
回复: 回复: [PATCH 3/3] kvfree_rcu: use migrate_disable/enable()
发件人: Uladzislau Rezki 发送时间: 2021年1月25日 5:57 收件人: Zhang, Qiang 抄送: Uladzislau Rezki (Sony); LKML; RCU; Paul E . McKenney; Michael Ellerman; Andrew Morton; Daniel Axtens; Frederic Weisbecker; Neeraj Upadhyay; Joel Fernandes; Peter Zijlstra; Michal Hocko; Thomas Gleixner; Theodore Y . Ts'o; Sebastian Andrzej Siewior; Oleksiy Avramchenko 主题: Re: 回复: [PATCH 3/3] kvfree_rcu: use migrate_disable/enable() >Hello, Zhang. > > > >发件人: Uladzislau Rezki (Sony) > >发送时间: 2021年1月21日 0:21 > >收件人: LKML; RCU; Paul E . McKenney; Michael Ellerman > >抄送: Andrew Morton; Daniel Axtens; Frederic Weisbecker; Neeraj >Upadhyay; > >Joel Fernandes; Peter Zijlstra; Michal Hocko; Thomas >Gleixner; Theodore Y . > >Ts'o; Sebastian Andrzej Siewior; Uladzislau >Rezki; Oleksiy Avramchenko > >主题: [PATCH 3/3] kvfree_rcu: use migrate_disable/enable() > > > >Since the page is obtained in a fully preemptible context, dropping > >the lock can lead to migration onto another CPU. As a result a prev. > >bnode of that CPU may be underutilised, because a decision has been > >made for a CPU that was run out of free slots to store a pointer. > > > >migrate_disable/enable() are now independent of RT, use it in order > >to prevent any migration during a page request for a specific CPU it > >is requested for. > > > Hello Rezki > > The critical migrate_disable/enable() area is not allowed to block, under RT > and non RT. > There is such a description in preempt.h > > > * Notes on the implementation. > * > * The implementation is particularly tricky since existing code patterns > * dictate neither migrate_disable() nor migrate_enable() is allowed to block. > * This means that it cannot use cpus_read_lock() to serialize against > hotplug, > * nor can it easily migrate itself into a pending affinity mask change on > * migrate_enable(). > >How i interpret it is migrate_enable()/migrate_disable() are not allowed to >use any blocking primitives, such as rwsem/mutexes/etc. in order to mark a >current context as non-migratable. > >void migrate_disable(void) >{ > struct task_struct *p = current; > > if (p->migration_disabled) { > p->migration_disabled++; > return; > } > preempt_disable(); > this_rq()->nr_pinned++; > p->migration_disabled = 1; > preempt_enable(); >} > >It does nothing that prevents you from doing schedule() or even wait for any >event(mutex slow path behaviour), when the process is removed from the >run-queue. >I mean after the migrate_disable() is invoked. Or i miss something? Hello Rezki Sorry, there's something wrong with the previous description. There are the following scenarios Due to migrate_disable will increase rq's nr_pinned, after that if get_free_page be blocked, and this time, CPU going offline, the sched_cpu_wait_empty() be called in per-cpu "cpuhp/%d" task, and be blocked. sched_cpu_wait_empty() { rcuwait_wait_event(>hotplug_wait, rq->nr_running == 1 && !rq_has_pinned_tasks(rq), TASK_UNINTERRUPTIBLE); } > > How about the following changes: > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index e7a226abff0d..2aa19537ac7c 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3488,12 +3488,10 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, > (*krcp)->bkvhead[idx]->nr_records == > KVFREE_BULK_MAX_ENTR) { > bnode = get_cached_bnode(*krcp); > if (!bnode && can_alloc) { > - migrate_disable(); > krc_this_cpu_unlock(*krcp, *flags); > bnode = (struct kvfree_rcu_bulk_data *) > __get_free_page(GFP_KERNEL | > __GFP_RETRY_MAYFAIL | __GFP_NOMEMALLOC | __GFP_NOWARN); > - *krcp = krc_this_cpu_lock(flags); > - migrate_enable(); > + raw_spin_lock_irqsave(&(*krcp)->lock, *flags); > Hm.. Taking the former lock can lead to a pointer leaking, i mean a CPU associated with "krcp" might go offline during a page request process, so a queuing occurs on off-lined CPU. Apat of that, acquiring a former lock still does not solve: - CPU1 in process of page allocation; - CPU1 gets migrated to CPU2; - another task running on CPU1 also allocate a page; - both bnodes are added to krcp associated with CPU1. I agree that such scenario probably will never happen or i would say, can be considered as a corner case. We can drop the: [PATCH 3/3] kvfree_rcu: use migrate_disable/enable() and
回复: 回复: [PATCH 3/3] kvfree_rcu: use migrate_disable/enable()
发件人: Uladzislau Rezki 发送时间: 2021年1月25日 5:57 收件人: Zhang, Qiang 抄送: Uladzislau Rezki (Sony); LKML; RCU; Paul E . McKenney; Michael Ellerman; Andrew Morton; Daniel Axtens; Frederic Weisbecker; Neeraj Upadhyay; Joel Fernandes; Peter Zijlstra; Michal Hocko; Thomas Gleixner; Theodore Y . Ts'o; Sebastian Andrzej Siewior; Oleksiy Avramchenko 主题: Re: 回复: [PATCH 3/3] kvfree_rcu: use migrate_disable/enable() >Hello, Zhang. > > > >发件人: Uladzislau Rezki (Sony) > >发送时间: 2021年1月21日 0:21 > >收件人: LKML; RCU; Paul E . McKenney; Michael Ellerman > >抄送: Andrew Morton; Daniel Axtens; Frederic Weisbecker; Neeraj >Upadhyay; > >Joel Fernandes; Peter Zijlstra; Michal Hocko; Thomas >Gleixner; Theodore Y . > >Ts'o; Sebastian Andrzej Siewior; Uladzislau >Rezki; Oleksiy Avramchenko > >主题: [PATCH 3/3] kvfree_rcu: use migrate_disable/enable() > > > >Since the page is obtained in a fully preemptible context, dropping > >the lock can lead to migration onto another CPU. As a result a prev. > >bnode of that CPU may be underutilised, because a decision has been > >made for a CPU that was run out of free slots to store a pointer. > > > >migrate_disable/enable() are now independent of RT, use it in order > >to prevent any migration during a page request for a specific CPU it > >is requested for. > > > Hello Rezki > > The critical migrate_disable/enable() area is not allowed to block, under RT > and non RT. > There is such a description in preempt.h > > > * Notes on the implementation. > * > * The implementation is particularly tricky since existing code patterns > * dictate neither migrate_disable() nor migrate_enable() is allowed to block. > * This means that it cannot use cpus_read_lock() to serialize against > hotplug, > * nor can it easily migrate itself into a pending affinity mask change on > * migrate_enable(). > >How i interpret it is migrate_enable()/migrate_disable() are not allowed to >use any blocking primitives, such as rwsem/mutexes/etc. in order to mark a >current context as non-migratable. > >void migrate_disable(void) >{ > struct task_struct *p = current; > > if (p->migration_disabled) { > p->migration_disabled++; > return; > } > preempt_disable(); > this_rq()->nr_pinned++; > p->migration_disabled = 1; > preempt_enable(); >} > >It does nothing that prevents you from doing schedule() or even wait for any >event(mutex slow path behaviour), when the process is removed from the >run-queue. >I mean after the migrate_disable() is invoked. Or i miss something? Hello Rezki Sorry, there's something wrong with the previous description. There are the following scenarios Due to migrate_disable will increase this_rq()->nr_pinned , after that if get_free_page be blocked, and this time, CPU going offline, the sched_cpu_wait_empty() be called in per-cpu "cpuhp/%d" task, and be blocked. blocked: sched_cpu_wait_empty() { struct rq *rq = this_rq(); rcuwait_wait_event(>hotplug_wait, rq->nr_running == 1 && !rq_has_pinned_tasks(rq), TASK_UNINTERRUPTIBLE); } wakeup: balance_push() { if (is_per_cpu_kthread(push_task) || is_migration_disabled(push_task)) { if (!rq->nr_running && !rq_has_pinned_tasks(rq) && rcuwait_active(>hotplug_wait)) { raw_spin_unlock(>lock); rcuwait_wake_up(>hotplug_wait); raw_spin_lock(>lock); } return; } } One of the conditions for this function to wake up is "rq->nr_pinned == 0" that is to say between migrate_disable/enable, if blocked will defect CPU going offline longer blocking time. I'm not sure that's a problem,and I didn't find it in the kernel code between migrate_disable/enable possible sleep calls. > > How about the following changes: > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index e7a226abff0d..2aa19537ac7c 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3488,12 +3488,10 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, > (*krcp)->bkvhead[idx]->nr_records == > KVFREE_BULK_MAX_ENTR) { > bnode = get_cached_bnode(*krcp); > if (!bnode && can_alloc) { > - migrate_disable(); > krc_this_cpu_unlock(*krcp, *flags); > bnode = (struct kvfree_rcu_bulk_data *) > __get_free_page(GFP_KERNEL | > __GFP_RETRY_MAYFAIL |
回复: 回复: [PATCH] rcu: Release per-cpu krcp page cache when CPU going offline
发件人: Uladzislau Rezki 发送时间: 2021年1月22日 22:31 收件人: Zhang, Qiang 抄送: Uladzislau Rezki; Paul E. McKenney; r...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: 回复: [PATCH] rcu: Release per-cpu krcp page cache when CPU going offline On Fri, Jan 22, 2021 at 01:44:36AM +, Zhang, Qiang wrote: > > > > 发件人: Uladzislau Rezki > 发送时间: 2021年1月22日 4:26 > 收件人: Zhang, Qiang > 抄送: Paul E. McKenney; r...@vger.kernel.org; linux-kernel@vger.kernel.org; > ure...@gmail.com > 主题: Re: [PATCH] rcu: Release per-cpu krcp page cache when CPU going offline > >Hello, Qiang, > > > On Thu, Jan 21, 2021 at 02:49:49PM +0800, qiang.zh...@windriver.com wrote: > > > From: Zqiang > > > > > > If CPUs go offline, the corresponding krcp's page cache can > > > not be use util the CPU come back online, or maybe the CPU > > > will never go online again, this commit therefore free krcp's > > > page cache when CPUs go offline. > > > > > > Signed-off-by: Zqiang > > > >Do you consider it as an issue? We have 5 pages per CPU, that is 20480 bytes. > > > > Hello Rezki > > In a multi CPUs system, more than one CPUs may be offline, there are more > than 5 pages, and these offline CPUs may never go online again or in the > process of CPUs online, there are errors, which lead to the failure of > online, these scenarios will lead to the per-cpu krc page cache will never be > released. > >Thanks for your answer. I was thinking more about if you knew some >platforms >which suffer from such extra page usage when CPU goes offline. Any >issues >your platforms or devices run into because of that. > >So i understand that if CPU goes offline the 5 pages associated with it >are >unused until it goes online back. I agree with you, But I still want to talk about what I think My understanding is that when the CPU is offline, the pages is not accessible, beacuse we don't know when this CPU will go online again, so we best to return these page to the buddy system, when the CPU goes online again, we can allocate page from the buddy system to fill krcp's page cache. maybe you may think that this memory is small and don't need to. Thanks Qiang > >-- >Vlad Rezki
回复: [PATCH 3/3] kvfree_rcu: use migrate_disable/enable()
> >发件人: Uladzislau Rezki (Sony) >发送时间: 2021年1月21日 0:21 >收件人: LKML; RCU; Paul E . McKenney; Michael Ellerman >抄送: Andrew Morton; Daniel Axtens; Frederic Weisbecker; Neeraj >Upadhyay; Joel >Fernandes; Peter Zijlstra; Michal Hocko; Thomas >Gleixner; Theodore Y . Ts'o; >Sebastian Andrzej Siewior; Uladzislau >Rezki; Oleksiy Avramchenko >主题: [PATCH 3/3] kvfree_rcu: use migrate_disable/enable() > >Since the page is obtained in a fully preemptible context, dropping >the lock can lead to migration onto another CPU. As a result a prev. >bnode of that CPU may be underutilised, because a decision has been >made for a CPU that was run out of free slots to store a pointer. > >migrate_disable/enable() are now independent of RT, use it in order >to prevent any migration during a page request for a specific CPU it >is requested for. Hello Rezki The critical migrate_disable/enable() area is not allowed to block, under RT and non RT. There is such a description in preempt.h * Notes on the implementation. * * The implementation is particularly tricky since existing code patterns * dictate neither migrate_disable() nor migrate_enable() is allowed to block. * This means that it cannot use cpus_read_lock() to serialize against hotplug, * nor can it easily migrate itself into a pending affinity mask change on * migrate_enable(). How about the following changes: diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index e7a226abff0d..2aa19537ac7c 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3488,12 +3488,10 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, (*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) { bnode = get_cached_bnode(*krcp); if (!bnode && can_alloc) { - migrate_disable(); krc_this_cpu_unlock(*krcp, *flags); bnode = (struct kvfree_rcu_bulk_data *) __get_free_page(GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOMEMALLOC | __GFP_NOWARN); - *krcp = krc_this_cpu_lock(flags); - migrate_enable(); + raw_spin_lock_irqsave(&(*krcp)->lock, *flags); } if (!bnode) Thanks Qiang > >Signed-off-by: Uladzislau Rezki (Sony) >--- > kernel/rcu/tree.c | 2 ++ >1 file changed, 2 insertions(+) > >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >index 454809514c91..cad36074366d 100644 >--- a/kernel/rcu/tree.c >+++ b/kernel/rcu/tree.c >@@ -3489,10 +3489,12 @@ add_ptr_to_bulk_krc_lock(struct >kfree_rcu_cpu **krcp, >(*krcp)->bkvhead[idx]->nr_records == > >KVFREE_BULK_MAX_ENTR) { >bnode = get_cached_bnode(*krcp); >if (!bnode && can_alloc) { >+ migrate_disable(); >krc_this_cpu_unlock(*krcp, *flags); >bnode = (struct kvfree_rcu_bulk_data *) >__get_free_page(GFP_KERNEL | > >__GFP_RETRY_MAYFAIL | __GFP_NOMEMALLOC | __GFP_NOWARN); > *krcp = krc_this_cpu_lock(flags); >+ migrate_enable(); >} > >if (!bnode) >-- >2.20.1
回复: [PATCH] rcu: Release per-cpu krcp page cache when CPU going offline
发件人: Uladzislau Rezki 发送时间: 2021年1月22日 4:26 收件人: Zhang, Qiang 抄送: Paul E. McKenney; r...@vger.kernel.org; linux-kernel@vger.kernel.org; ure...@gmail.com 主题: Re: [PATCH] rcu: Release per-cpu krcp page cache when CPU going offline >Hello, Qiang, > On Thu, Jan 21, 2021 at 02:49:49PM +0800, qiang.zh...@windriver.com wrote: > > From: Zqiang > > > > If CPUs go offline, the corresponding krcp's page cache can > > not be use util the CPU come back online, or maybe the CPU > > will never go online again, this commit therefore free krcp's > > page cache when CPUs go offline. > > > > Signed-off-by: Zqiang > >Do you consider it as an issue? We have 5 pages per CPU, that is 20480 bytes. > Hello Rezki In a multi CPUs system, more than one CPUs may be offline, there are more than 5 pages, and these offline CPUs may never go online again or in the process of CPUs online, there are errors, which lead to the failure of online, these scenarios will lead to the per-cpu krc page cache will never be released. Thanks Qiang >-- >Vlad Rezki
回复: Question on workqueue: Manually break affinity on hotplug
Hello Peter, Lai Sorry to disturb again, I'm still confused, when the CPU is offline, we active call set_cpus_allowed_ptr function to reset per-cpu kthread cpumask, in sched_cpu_dying function , migrate_tasks function will reset per-cpu kthread's cpumask on runq, even if not on runq, when wake up, other online CPUs will also be selected to run. what I want to ask is why we take the initiative to set it up? Thanks Qiang 发件人: Peter Zijlstra 发送时间: 2021年1月14日 17:11 收件人: Zhang, Qiang 抄送: linux-kernel@vger.kernel.org 主题: Re: Question on workqueue: Manually break affinity on hotplug [Please note this e-mail is from an EXTERNAL e-mail address] On Thu, Jan 14, 2021 at 08:03:23AM +, Zhang, Qiang wrote: > Hello Peter > > Excuse me, I have some questions for you, about a description of this change: > > ''Don't rely on the scheduler to force break affinity for us -- it will > stop doing that for per-cpu-kthreads." > > this mean when cpuhotplug, scheduler do not change affinity for > per-cpu-kthread's task, if we not active setting affinity? > but if per-cpu-kthread's task is not run state, when wake up, will reset > it's affinity, this is done automatically. > > or is it, this place modified to fit the new one hotplug mechanism which > ("sched/hotplug: Consolidate task migration on CPU unplug")? https://lkml.kernel.org/r/20201214155457.3430-1-jiangshan...@gmail.com https://lkml.kernel.org/r/20201218170919.2950-1-jiangshan...@gmail.com https://lkml.kernel.org/r/20201226025117.2770-1-jiangshan...@gmail.com https://lkml.kernel.org/r/2021052638.2417-1-jiangshan...@gmail.com https://lkml.kernel.org/r/20210112144344.850850...@infradead.org
Question on workqueue: Manually break affinity on hotplug
Hello Peter Excuse me, I have some questions for you, about a description of this change: ''Don't rely on the scheduler to force break affinity for us -- it will stop doing that for per-cpu-kthreads." this mean when cpuhotplug, scheduler do not change affinity for per-cpu-kthread's task, if we not active setting affinity? but if per-cpu-kthread's task is not run state, when wake up, will reset it's affinity, this is done automatically. or is it, this place modified to fit the new one hotplug mechanism which ("sched/hotplug: Consolidate task migration on CPU unplug")? Thanks Qiang
回复: KASAN: use-after-free Read in usb_anchor_resume_wakeups (2)
发件人: Zhang, Qiang 发送时间: 2021年1月12日 11:28 收件人: syzbot; a.darw...@linutronix.de; allen.l...@gmail.com; andreyk...@google.com; dvyu...@google.com; el...@google.com; gre...@linuxfoundation.org; gustavo...@kernel.org; linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; syzkaller-b...@googlegroups.com; t...@linutronix.de 主题: 回复: KASAN: use-after-free Read in usb_anchor_resume_wakeups (2) 发件人: syzbot 发送时间: 2021年1月12日 0:11 收件人: a.darw...@linutronix.de; allen.l...@gmail.com; andreyk...@google.com; dvyu...@google.com; el...@google.com; gre...@linuxfoundation.org; gustavo...@kernel.org; linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; syzkaller-b...@googlegroups.com; t...@linutronix.de 主题: KASAN: use-after-free Read in usb_anchor_resume_wakeups (2) Hello, syzbot found the following issue on: HEAD commit:841081d8 usb: usbip: Use DEFINE_SPINLOCK() for spinlock git tree: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing console output: https://syzkaller.appspot.com/x/log.txt?x=12f42a3f50 kernel config: https://syzkaller.appspot.com/x/.config?x=6f9911c273a88e5 dashboard link: https://syzkaller.appspot.com/bug?extid=39c636a0650bcbb172ec compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+39c636a0650bcbb17...@syzkaller.appspotmail.com xpad 6-1:0.65: xpad_irq_in - usb_submit_urb failed with result -19 xpad 6-1:0.65: xpad_irq_out - usb_submit_urb failed with result -19 == BUG: KASAN: use-after-free in register_lock_class+0xecc/0x1100 kernel/locking/lockdep.c:1291 Read of size 2 at addr 888137488092 by task systemd-udevd/7474 CPU: 1 PID: 7474 Comm: systemd-udevd Not tainted 5.11.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:230 __kasan_report mm/kasan/report.c:396 [inline] kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413 register_lock_class+0xecc/0x1100 kernel/locking/lockdep.c:1291 __lock_acquire+0x101/0x54f0 kernel/locking/lockdep.c:4711 lock_acquire kernel/locking/lockdep.c:5437 [inline] lock_acquire+0x288/0x700 kernel/locking/lockdep.c:5402 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x36/0x50 kernel/locking/spinlock.c:159 __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137 usb_anchor_resume_wakeups drivers/usb/core/urb.c:937 [inline] usb_anchor_resume_wakeups+0xbe/0xe0 drivers/usb/core/urb.c:930 __usb_hcd_giveback_urb+0x2df/0x5c0 drivers/usb/core/hcd.c:1661 usb_hcd_giveback_urb+0x367/0x410 drivers/usb/core/hcd.c:1728 dummy_timer+0x11f4/0x32a0 drivers/usb/gadget/udc/dummy_hcd.c:1971 call_timer_fn+0x1a5/0x630 kernel/time/timer.c:1417 expire_timers kernel/time/timer.c:1462 [inline] __run_timers.part.0+0x67c/0xa10 kernel/time/timer.c:1731 __run_timers kernel/time/timer.c:1712 [inline] run_timer_softirq+0x80/0x120 kernel/time/timer.c:1744 __do_softirq+0x1b7/0x977 kernel/softirq.c:343 asm_call_irq_on_stack+0xf/0x20 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0x80/0xa0 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:226 [inline] __irq_exit_rcu kernel/softirq.c:420 [inline] irq_exit_rcu+0x110/0x1a0 kernel/softirq.c:432 sysvec_apic_timer_interrupt+0x43/0xa0 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628 RIP: 0010:__sanitizer_cov_trace_pc+0x37/0x60 kernel/kcov.c:197 Code: 81 e1 00 01 00 00 65 48 8b 14 25 40 ef 01 00 a9 00 01 ff 00 74 0e 85 c9 74 35 8b 82 dc 13 00 00 85 c0 74 2b 8b 82 b8 13 00 00 <83> f8 02 75 20 48 8b 8a c0 13 00 00 8b 92 bc 13 00 00 48 8b 01 48 RSP: 0018:c90005f875b0 EFLAGS: 0246 RAX: RBX: 0003 RCX: RDX: 888116d85040 RSI: 81dabe81 RDI: 0003 RBP: 888102c2bf00 R08: R09: 0003 R10: 81dabeba R11: 0010 R12: 0002 R13: 01cc R14: dc00 R15: tomoyo_domain_quota_is_ok+0x2f1/0x550 security/tomoyo/util.c:1093 tomoyo_supervisor+0x2f2/0xf00 security/tomoyo/common.c:2089 tomoyo_audit_path_log security/tomoyo/file.c:168 [inline] tomoyo_path_permission security/tomoyo/file.c:587 [inline] tomoyo_path_permission+0x270/0x3a0 security/tomoyo/file.c:573 tomoyo_check_open_permission+0x33e/0x380 security/tomoyo/file.c:777 tomoyo_file_open security/tomoyo/tomoyo.c:313 [
回复: KASAN: use-after-free Read in usb_anchor_resume_wakeups (2)
发件人: syzbot 发送时间: 2021年1月12日 0:11 收件人: a.darw...@linutronix.de; allen.l...@gmail.com; andreyk...@google.com; dvyu...@google.com; el...@google.com; gre...@linuxfoundation.org; gustavo...@kernel.org; linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; syzkaller-b...@googlegroups.com; t...@linutronix.de 主题: KASAN: use-after-free Read in usb_anchor_resume_wakeups (2) Hello, syzbot found the following issue on: HEAD commit:841081d8 usb: usbip: Use DEFINE_SPINLOCK() for spinlock git tree: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing console output: https://syzkaller.appspot.com/x/log.txt?x=12f42a3f50 kernel config: https://syzkaller.appspot.com/x/.config?x=6f9911c273a88e5 dashboard link: https://syzkaller.appspot.com/bug?extid=39c636a0650bcbb172ec compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+39c636a0650bcbb17...@syzkaller.appspotmail.com xpad 6-1:0.65: xpad_irq_in - usb_submit_urb failed with result -19 xpad 6-1:0.65: xpad_irq_out - usb_submit_urb failed with result -19 == BUG: KASAN: use-after-free in register_lock_class+0xecc/0x1100 kernel/locking/lockdep.c:1291 Read of size 2 at addr 888137488092 by task systemd-udevd/7474 CPU: 1 PID: 7474 Comm: systemd-udevd Not tainted 5.11.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:230 __kasan_report mm/kasan/report.c:396 [inline] kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413 register_lock_class+0xecc/0x1100 kernel/locking/lockdep.c:1291 __lock_acquire+0x101/0x54f0 kernel/locking/lockdep.c:4711 lock_acquire kernel/locking/lockdep.c:5437 [inline] lock_acquire+0x288/0x700 kernel/locking/lockdep.c:5402 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x36/0x50 kernel/locking/spinlock.c:159 __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137 usb_anchor_resume_wakeups drivers/usb/core/urb.c:937 [inline] usb_anchor_resume_wakeups+0xbe/0xe0 drivers/usb/core/urb.c:930 __usb_hcd_giveback_urb+0x2df/0x5c0 drivers/usb/core/hcd.c:1661 usb_hcd_giveback_urb+0x367/0x410 drivers/usb/core/hcd.c:1728 dummy_timer+0x11f4/0x32a0 drivers/usb/gadget/udc/dummy_hcd.c:1971 call_timer_fn+0x1a5/0x630 kernel/time/timer.c:1417 expire_timers kernel/time/timer.c:1462 [inline] __run_timers.part.0+0x67c/0xa10 kernel/time/timer.c:1731 __run_timers kernel/time/timer.c:1712 [inline] run_timer_softirq+0x80/0x120 kernel/time/timer.c:1744 __do_softirq+0x1b7/0x977 kernel/softirq.c:343 asm_call_irq_on_stack+0xf/0x20 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0x80/0xa0 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:226 [inline] __irq_exit_rcu kernel/softirq.c:420 [inline] irq_exit_rcu+0x110/0x1a0 kernel/softirq.c:432 sysvec_apic_timer_interrupt+0x43/0xa0 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628 RIP: 0010:__sanitizer_cov_trace_pc+0x37/0x60 kernel/kcov.c:197 Code: 81 e1 00 01 00 00 65 48 8b 14 25 40 ef 01 00 a9 00 01 ff 00 74 0e 85 c9 74 35 8b 82 dc 13 00 00 85 c0 74 2b 8b 82 b8 13 00 00 <83> f8 02 75 20 48 8b 8a c0 13 00 00 8b 92 bc 13 00 00 48 8b 01 48 RSP: 0018:c90005f875b0 EFLAGS: 0246 RAX: RBX: 0003 RCX: RDX: 888116d85040 RSI: 81dabe81 RDI: 0003 RBP: 888102c2bf00 R08: R09: 0003 R10: 81dabeba R11: 0010 R12: 0002 R13: 01cc R14: dc00 R15: tomoyo_domain_quota_is_ok+0x2f1/0x550 security/tomoyo/util.c:1093 tomoyo_supervisor+0x2f2/0xf00 security/tomoyo/common.c:2089 tomoyo_audit_path_log security/tomoyo/file.c:168 [inline] tomoyo_path_permission security/tomoyo/file.c:587 [inline] tomoyo_path_permission+0x270/0x3a0 security/tomoyo/file.c:573 tomoyo_check_open_permission+0x33e/0x380 security/tomoyo/file.c:777 tomoyo_file_open security/tomoyo/tomoyo.c:313 [inline] tomoyo_file_open+0xa3/0xd0 security/tomoyo/tomoyo.c:308 security_file_open+0x52/0x4f0 security/security.c:1576 do_dentry_open+0x353/0x1090 fs/open.c:804 do_open fs/namei.c:3254 [inline] path_openat+0x1b9a/0x2730 fs/namei.c:3371 do_filp_open+0x17e/0x3c0 fs/namei.c:3398 do_sys_openat2+0x16d/0x420 fs/open.c:1172 do_sys_open fs/open.c:1188 [inline] __do_sys_open fs/open.c:1196 [inline] __se_sys_open fs/open.c:1192 [inline]
回复: KASAN: use-after-free Read in service_outstanding_interrupt
发件人: Oliver Neukum 发送时间: 2021年1月5日 0:28 收件人: syzbot; andreyk...@google.com; gre...@linuxfoundation.org; gustavo...@kernel.org; ingras...@epigenesys.com; lee.jo...@linaro.org; linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; penguin-ker...@i-love.sakura.ne.jp; syzkaller-b...@googlegroups.com 主题: Re: KASAN: use-after-free Read in service_outstanding_interrupt Am Donnerstag, den 17.12.2020, 19:21 -0800 schrieb syzbot: > syzbot has found a reproducer for the following issue on: > > HEAD commit:5e60366d Merge tag 'fallthrough-fixes-clang-5.11-rc1' of g.. > git tree: > https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing > console output: https://syzkaller.appspot.com/x/log.txt?x=12c5b62350 > kernel config: https://syzkaller.appspot.com/x/.config?x=5cea7506b7139727 > dashboard link: https://syzkaller.appspot.com/bug?extid=9e04e2df4a32fb661daf > compiler: gcc (GCC) 10.1.0-syz 20200507 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=175adf0750 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1672680f50 > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: >syzbot+9e04e2df4a32fb661...@syzkaller.appspotmail.com > >#syz test: https://github.com/google/kasan.git 5e60366d > Hello Oliver this use-after-free still exists,It can be seen from calltrace that it is usb_device's object has been released when disconnect, can add a reference count to usb_device's object to avoid this problem diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c index 508b1c3f8b73..001cb93da6bf 100644 --- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -106,6 +106,7 @@ struct wdm_device { struct list_headdevice_list; int (*manage_power)(struct usb_interface *, int); + struct usb_device *usb_dev; }; static struct usb_driver wdm_driver; @@ -338,6 +339,7 @@ static void free_urbs(struct wdm_device *desc) static void cleanup(struct wdm_device *desc) { + usb_put_dev(desc->usb_dev); kfree(desc->sbuf); kfree(desc->inbuf); kfree(desc->orq); @@ -855,6 +857,7 @@ static int wdm_create(struct usb_interface *intf, struct usb_endpoint_descripto r desc->intf = intf; INIT_WORK(>rxwork, wdm_rxwork); INIT_WORK(>service_outs_intr, service_interrupt_work); + desc->usb_dev = usb_get_dev(interface_to_usbdev(intf)); rv = -EINVAL; if (!usb_endpoint_is_int_in(ep)) >From f51e3c5a202f3abc805edd64b21a68d29dd9d60e Mon Sep 17 >00:00:00 2001 >From: Oliver Neukum >Date: Mon, 4 Jan 2021 17:26:33 +0100 >Subject: [PATCH] cdc-wdm: poison URBs upon disconnect > >We have a chicken and egg issue between interrupt and work. >This should break the cycle. > >Signed-off-by: Oliver Neukum >--- >drivers/usb/class/cdc-wdm.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > >diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc->wdm.c >index 02d0cfd23bb2..14eddda35280 100644 >--- a/drivers/usb/class/cdc-wdm.c >+++ b/drivers/usb/class/cdc-wdm.c >@@ -324,9 +324,9 @@ static void wdm_int_callback(struct urb *urb) >static void kill_urbs(struct wdm_device *desc) > { > /* the order here is essential */ >- usb_kill_urb(desc->command); >- usb_kill_urb(desc->validity); >- usb_kill_urb(desc->response); >+ usb_poison_urb(desc->command); >+ usb_poison_urb(desc->validity); >+ usb_poison_urb(desc->response); > } > > static void free_urbs(struct wdm_device *desc) >-- >2.26.2
回复: [PATCH] ipc/sem.c: Convert kfree_rcu() to call_rcu() in freeary function
发件人: Paul E. McKenney 发送时间: 2020年12月31日 0:19 收件人: Zhang, Qiang 抄送: a...@linux-foundation.org; manf...@colorfullife.com; gustavo...@kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] ipc/sem.c: Convert kfree_rcu() to call_rcu() in freeary function On Wed, Dec 30, 2020 at 08:00:38PM +0800, qiang.zh...@windriver.com wrote: > From: Zqiang > > Due to freeary function is called with spinlock be held, > the synchronize_rcu function may be called in kfree_rcu > function, the schedule may be happen in spinlock critical > region, need to replace kfree_rcu() with call_rcu(). > >Except that the call to kfree_rcu() below has two arguments, and >thus >provides a link for queuing the callback. It will never directly invoke >synchronize_rcu(). It is only the single-argument variant of >kfree_rcu() >that might invoke synchronize_rcu(). Sorry. It was my mistake, please ignore this patch. Thanks Qiang >Or are you seeing lockdep or might-sleep failures with the current >code? >If so, please post the relevant portions of the console output. > > Thanx, Paul > > Fixes: 693a8b6eecce ("ipc,rcu: Convert call_rcu(free_un) to kfree_rcu()") > Signed-off-by: Zqiang > --- > ipc/sem.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/ipc/sem.c b/ipc/sem.c > index f6c30a85dadf..12c3184347d9 100644 > --- a/ipc/sem.c > +++ b/ipc/sem.c > @@ -1132,6 +1132,13 @@ static int count_semcnt(struct sem_array *sma, ushort > semnum, > return semcnt; > } > > +static void free_un(struct rcu_head *head) > +{ > + struct sem_undo *un = container_of(head, struct sem_undo, rcu); > + > + kfree(un); > +} > + > /* Free a semaphore set. freeary() is called with sem_ids.rwsem locked > * as a writer and the spinlock for this semaphore set hold. sem_ids.rwsem > * remains locked on exit. > @@ -1152,7 +1159,7 @@ static void freeary(struct ipc_namespace *ns, struct > kern_ipc_perm *ipcp) > un->semid = -1; > list_del_rcu(>list_proc); > spin_unlock(>ulp->lock); > - kfree_rcu(un, rcu); > + call_rcu(>rcu, free_un); > } > > /* Wake up all pending processes and let them fail with EIDRM. */ > -- > 2.17.1 >
回复: INFO: task hung in ath6kl_usb_destroy (3)
发件人: syzbot 发送时间: 2020年11月30日 23:31 收件人: andreyk...@google.com; da...@davemloft.net; k...@kernel.org; kv...@codeaurora.org; linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; linux-wirel...@vger.kernel.org; net...@vger.kernel.org; syzkaller-b...@googlegroups.com 主题: INFO: task hung in ath6kl_usb_destroy (3) [Please note this e-mail is from an EXTERNAL e-mail address] Hello, syzbot found the following issue on: HEAD commit:ebad4326 Merge 5.10-rc6 into usb-next git tree: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing console output: https://syzkaller.appspot.com/x/log.txt?x=1566291d50 kernel config: https://syzkaller.appspot.com/x/.config?x=fe8988e4dc252d01 dashboard link: https://syzkaller.appspot.com/bug?extid=bccb3d118a39c43b6c9d compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+bccb3d118a39c43b6...@syzkaller.appspotmail.com INFO: task kworker/1:4:7246 blocked for more than 143 seconds. Not tainted 5.10.0-rc6-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/1:4 state:D stack:22864 pid: 7246 ppid: 2 flags:0x4000 Workqueue: usb_hub_wq hub_event Call Trace: context_switch kernel/sched/core.c:3779 [inline] __schedule+0x8a2/0x1f30 kernel/sched/core.c:4528 schedule+0xcb/0x270 kernel/sched/core.c:4606 schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1847 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x168/0x270 kernel/sched/completion.c:138 flush_workqueue+0x3ff/0x13e0 kernel/workqueue.c:2835 flush_scheduled_work include/linux/workqueue.h:597 [inline] ath6kl_usb_flush_all drivers/net/wireless/ath/ath6kl/usb.c:476 [inline] void ath6kl_usb_flush_all flush_scheduled_work call "flush_scheduled_work" function to flush all work in system_wq? should "flush_work" need to be called? ath6kl_usb_destroy+0xc6/0x290 drivers/net/wireless/ath/ath6kl/usb.c:609 ath6kl_usb_probe+0xc7b/0x11f0 drivers/net/wireless/ath/ath6kl/usb.c:1166 usb_probe_interface+0x315/0x7f0 drivers/usb/core/driver.c:396 really_probe+0x291/0xde0 drivers/base/dd.c:554 driver_probe_device+0x26b/0x3d0 drivers/base/dd.c:738 __device_attach_driver+0x1d1/0x290 drivers/base/dd.c:844 bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:431 __device_attach+0x228/0x4a0 drivers/base/dd.c:912 bus_probe_device+0x1e4/0x290 drivers/base/bus.c:491 device_add+0xbb2/0x1ce0 drivers/base/core.c:2936 usb_set_configuration+0x113c/0x1910 drivers/usb/core/message.c:2168 usb_generic_driver_probe+0xba/0x100 drivers/usb/core/generic.c:238 usb_probe_device+0xd9/0x2c0 drivers/usb/core/driver.c:293 really_probe+0x291/0xde0 drivers/base/dd.c:554 driver_probe_device+0x26b/0x3d0 drivers/base/dd.c:738 __device_attach_driver+0x1d1/0x290 drivers/base/dd.c:844 bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:431 __device_attach+0x228/0x4a0 drivers/base/dd.c:912 bus_probe_device+0x1e4/0x290 drivers/base/bus.c:491 device_add+0xbb2/0x1ce0 drivers/base/core.c:2936 usb_new_device.cold+0x71d/0xfe9 drivers/usb/core/hub.c:2555 hub_port_connect drivers/usb/core/hub.c:5223 [inline] hub_port_connect_change drivers/usb/core/hub.c:5363 [inline] port_event drivers/usb/core/hub.c:5509 [inline] hub_event+0x2348/0x42d0 drivers/usb/core/hub.c:5591 process_one_work+0x933/0x1520 kernel/workqueue.c:2272 process_scheduled_works kernel/workqueue.c:2334 [inline] worker_thread+0x82b/0x1120 kernel/workqueue.c:2420 kthread+0x38c/0x460 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Showing all locks held in the system: 5 locks held by kworker/0:0/5: #0: 888103c7ed38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] #0: 888103c7ed38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline] #0: 888103c7ed38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline] #0: 888103c7ed38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:616 [inline] #0: 888103c7ed38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline] #0: 888103c7ed38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x821/0x1520 kernel/workqueue.c:2243 #1: c905fda8 ((work_completion)(>events)){+.+.}-{0:0}, at: process_one_work+0x854/0x1520 kernel/workqueue.c:2247 #2: 888108dd6218 (>mutex){}-{3:3}, at: device_lock include/linux/device.h:731 [inline] #2: 888108dd6218 (>mutex){}-{3:3}, at:
回复: [PATCH] srcu: Remove srcu_cblist_invoking member from sdp
发件人: Paul E. McKenney 发送时间: 2020年11月20日 2:12 收件人: Zhang, Qiang 抄送: jiangshan...@gmail.com; rost...@goodmis.org; j...@joshtriplett.org; r...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] srcu: Remove srcu_cblist_invoking member from sdp [Please note this e-mail is from an EXTERNAL e-mail address] On Thu, Nov 19, 2020 at 01:34:11PM +0800, qiang.zh...@windriver.com wrote: > From: Zqiang > > Workqueue can ensure the multiple same sdp->work sequential > execution in rcu_gp_wq, not need srcu_cblist_invoking to > prevent concurrent execution, so remove it. > > Signed-off-by: Zqiang >Good job analyzing the code, which is very good to see!!! > >But these do have a potential purpose. Right now, it is OK to invoke >synchronize_srcu() during early boot, that is, before the scheduler >has started. But there is a gap from the time that the scheduler has >initialized (so that preemption and blocking are possible) and the time >that workqueues are initialized and fully functional. Only after that >is it once again OK to use synchronize_srcu(). > >If synchronize_srcu() is ever required to work correctly during that >time period, it will need to directly invoke the functions that are >currently run in workqueue context. Which means that there will then be >the possibility of two instances of these functions running just after >workqueues are available. > > Thanx, Paul Thanks Paul. > --- > include/linux/srcutree.h | 1 - > kernel/rcu/srcutree.c| 8 ++-- > 2 files changed, 2 insertions(+), 7 deletions(-) > > diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h > index 9cfcc8a756ae..62d8312b5451 100644 > --- a/include/linux/srcutree.h > +++ b/include/linux/srcutree.h > @@ -31,7 +31,6 @@ struct srcu_data { > struct rcu_segcblist srcu_cblist; /* List of callbacks.*/ > unsigned long srcu_gp_seq_needed; /* Furthest future GP needed. */ > unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */ > - bool srcu_cblist_invoking; /* Invoking these CBs? */ > struct timer_list delay_work; /* Delay for CB invoking */ > struct work_struct work;/* Context for CB invoking. */ > struct rcu_head srcu_barrier_head; /* For srcu_barrier() use. */ > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > index 3c5e2806e0b9..c4d5cd2567a6 100644 > --- a/kernel/rcu/srcutree.c > +++ b/kernel/rcu/srcutree.c > @@ -134,7 +134,6 @@ static void init_srcu_struct_nodes(struct srcu_struct > *ssp, bool is_static) > sdp = per_cpu_ptr(ssp->sda, cpu); > spin_lock_init(_PRIVATE(sdp, lock)); > rcu_segcblist_init(>srcu_cblist); > - sdp->srcu_cblist_invoking = false; > sdp->srcu_gp_seq_needed = ssp->srcu_gp_seq; > sdp->srcu_gp_seq_needed_exp = ssp->srcu_gp_seq; > sdp->mynode = _first[cpu / levelspread[level]]; > @@ -1254,14 +1253,11 @@ static void srcu_invoke_callbacks(struct work_struct > *work) > spin_lock_irq_rcu_node(sdp); > rcu_segcblist_advance(>srcu_cblist, > rcu_seq_current(>srcu_gp_seq)); > - if (sdp->srcu_cblist_invoking || > - !rcu_segcblist_ready_cbs(>srcu_cblist)) { > + if (!rcu_segcblist_ready_cbs(>srcu_cblist)) { > spin_unlock_irq_rcu_node(sdp); > return; /* Someone else on the job or nothing to do. */ > } > > - /* We are on the job! Extract and invoke ready callbacks. */ > - sdp->srcu_cblist_invoking = true; > rcu_segcblist_extract_done_cbs(>srcu_cblist, _cbs); > len = ready_cbs.len; > spin_unlock_irq_rcu_node(sdp); > @@ -1282,7 +1278,7 @@ static void srcu_invoke_callbacks(struct work_struct > *work) > rcu_segcblist_add_len(>srcu_cblist, -len); > (void)rcu_segcblist_accelerate(>srcu_cblist, > rcu_seq_snap(>srcu_gp_seq)); > - sdp->srcu_cblist_invoking = false; > + > more = rcu_segcblist_ready_cbs(>srcu_cblist); > spin_unlock_irq_rcu_node(sdp); > if (more) > -- > 2.17.1 >
回复: [PATCH] kthread_worker: Add flush delayed work func
发件人: Andrew Morton 发送时间: 2020年11月13日 8:01 收件人: Zhang, Qiang 抄送: pmla...@suse.com; t...@kernel.org; linux...@kvack.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] kthread_worker: Add flush delayed work func [Please note this e-mail is from an EXTERNAL e-mail address] On Wed, 11 Nov 2020 17:13:55 +0800 qiang.zh...@windriver.com wrote: > Add 'kthread_flush_delayed_work' func, the principle of > this func is wait for a dwork to finish executing the > last queueing. > >We'd like to see some code which actually uses this new function >please. Either in this patch or as one or more followup patches. > >btw, we call it "function", not "func". But neither is really needed - >just use () to identify a function. ie: >: Add kthread_flush_delayed_work(). The principle of this is to wait for >: a dwork to finish executing the last queueing. I don't see it being used in the kernel code so far, and I'm not sure if it's going to be used in subsequent scenarios (it like flush_delayed_work in workqueue )or whether it's currently using "kthread_work" some code needs it. Thanks Qiang
回复: memory leak in __usbhid_submit_report
发件人: syzbot 发送时间: 2020年11月11日 21:55 收件人: benjamin.tissoi...@redhat.com; ji...@kernel.org; linux-in...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; syzkaller-b...@googlegroups.com 主题: memory leak in __usbhid_submit_report [Please note this e-mail is from an EXTERNAL e-mail address] Hello, syzbot found the following issue on: HEAD commit:f8394f23 Linux 5.10-rc3 git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=12ebbdc650 kernel config: https://syzkaller.appspot.com/x/.config?x=a3f13716fa0212fd dashboard link: https://syzkaller.appspot.com/bug?extid=47b26cd837ececfc666d compiler: gcc (GCC) 10.1.0-syz 20200507 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14497b8250 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1586ff1450 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+47b26cd837ececfc6...@syzkaller.appspotmail.com BUG: memory leak unreferenced object 0x8881097e5ec0 (size 32): comm "kworker/0:1", pid 7, jiffies 4294949214 (age 33.520s) hex dump (first 32 bytes): 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace: [<8296eaa1>] __usbhid_submit_report+0x116/0x490 drivers/hid/usbhid/hid-core.c:588 [] usbhid_submit_report drivers/hid/usbhid/hid-core.c:638 [inline] [ ] usbhid_request+0x59/0xa0 drivers/hid/usbhid/hid-core.c:1272 [<428a854b>] hidinput_led_worker+0x59/0x160 drivers/hid/hid-input.c:1507 [<1bb8d86d>] process_one_work+0x27d/0x590 kernel/workqueue.c:2272 [<5d9a2f9c>] worker_thread+0x59/0x5d0 kernel/workqueue.c:2418 [ ] kthread+0x178/0x1b0 kernel/kthread.c:292 [<99d5a9ee>] ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 When usb device disconnect the "raw_report" should be free in usbhid_stop. can we release it in this function, as shown below: usbhid_stop(struct hid_device *hid) { for (index = 0; index < HID_CONTROL_FIFO_SIZE; index++) { if (usbhid->ctrl[index].raw_report) kfree(usbhid->ctrl[index].raw_report); if (usbhid->out[index].raw_report) kfree(usbhid->out[index].raw_report); } ... } BUG: memory leak unreferenced object 0x8881120200c0 (size 32): comm "kworker/0:1", pid 7, jiffies 4294949214 (age 33.520s) hex dump (first 32 bytes): 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace: [<8296eaa1>] __usbhid_submit_report+0x116/0x490 drivers/hid/usbhid/hid-core.c:588 [ ] usbhid_submit_report drivers/hid/usbhid/hid-core.c:638 [inline] [ ] usbhid_request+0x59/0xa0 drivers/hid/usbhid/hid-core.c:1272 [<428a854b>] hidinput_led_worker+0x59/0x160 drivers/hid/hid-input.c:1507 [<1bb8d86d>] process_one_work+0x27d/0x590 kernel/workqueue.c:2272 [<5d9a2f9c>] worker_thread+0x59/0x5d0 kernel/workqueue.c:2418 [ ] kthread+0x178/0x1b0 kernel/kthread.c:292 [<99d5a9ee>] ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 BUG: memory leak unreferenced object 0x888107fa9420 (size 32): comm "kworker/0:1", pid 7, jiffies 4294949214 (age 33.520s) hex dump (first 32 bytes): 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace: [<8296eaa1>] __usbhid_submit_report+0x116/0x490 drivers/hid/usbhid/hid-core.c:588 [ ] usbhid_submit_report drivers/hid/usbhid/hid-core.c:638 [inline] [ ] usbhid_request+0x59/0xa0 drivers/hid/usbhid/hid-core.c:1272 [<428a854b>] hidinput_led_worker+0x59/0x160 drivers/hid/hid-input.c:1507 [<1bb8d86d>] process_one_work+0x27d/0x590 kernel/workqueue.c:2272 [<5d9a2f9c>] worker_thread+0x59/0x5d0 kernel/workqueue.c:2418 [ ] kthread+0x178/0x1b0 kernel/kthread.c:292 [<99d5a9ee>] ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 BUG: memory leak unreferenced object 0x888112020b60 (size 32): comm "kworker/1:4", pid 8569, jiffies 4294949237 (age 33.290s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace: [<8296eaa1>] __usbhid_submit_report+0x116/0x490 drivers/hid/usbhid/hid-core.c:588 [ ] usbhid_submit_report drivers/hid/usbhid/hid-core.c:638
回复: 回复: [PATCH v2] kthread_worker: re-set CPU affinities if CPU come online
发件人: Thomas Gleixner 发送时间: 2020年10月28日 17:23 收件人: Zhang, Qiang; pmla...@suse.com; t...@kernel.org 抄送: a...@linux-foundation.org; linux...@kvack.org; linux-kernel@vger.kernel.org 主题: Re: 回复: [PATCH v2] kthread_worker: re-set CPU affinities if CPU come online > [Please note this e-mail is from an EXTERNAL e-mail address] > > On Wed, Oct 28 2020 at 15:30, qiang zhang wrote: > >>How is that addressing any of the comments I made on V1 of this? > > Do you mean the following problem: > > "The dynamic hotplug states run late. What's preventing work to be queued > on such a worker before it is bound to the CPU again?" > >This is one problem, but there are more and I explained them in great >length. If there is anything unclear, then please ask. Really, this patch is not considered that work may be put into the queue after the bound CPU is offline. in addition, when the bound CPU goes online again, before restoring the worker's CPU affinity, work may be put into the queue. Although int this (powerclamp) way,that's not a problem, that it is solved by destroying and creating tasks when the CPU hotplug, in addition, when CPU going down , this need call 'cancel_work_sync' func in offline callback, this may be blocked long time. these operation is expensive. this patch only just to recover the worker task's affinity when CPU go to online again that create by "kthread_create_worker_on_cpu" func , likely per-CPU worker method when CPU hotplug in "workqueue" and "io-wq". Thanks Qiang > >Thanks, > >tglx
回复: [PATCH v2] kthread_worker: re-set CPU affinities if CPU come online
发件人: Thomas Gleixner 发送时间: 2020年10月28日 16:30 收件人: Zhang, Qiang; pmla...@suse.com; t...@kernel.org 抄送: a...@linux-foundation.org; linux...@kvack.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH v2] kthread_worker: re-set CPU affinities if CPU come online [Please note this e-mail is from an EXTERNAL e-mail address] On Wed, Oct 28 2020 at 15:30, qiang zhang wrote: > From: Zqiang > > When someone CPU offlined, the 'kthread_worker' which bind this CPU, > will run anywhere, if this CPU online, recovery of 'kthread_worker' > affinity by cpuhp notifiers. > > Signed-off-by: Zqiang > --- > v1->v2: > rename variable kworker_online to kthread_worker_online. > add 'cpuhp_node' and 'bind_cpu' init in KTHREAD_WORKER_INIT. > add a comment explaining for WARN_ON_ONCE. >How is that addressing any of the comments I made on V1 of this? Do you mean the following problem: "The dynamic hotplug states run late. What's preventing work to be queued on such a worker before it is bound to the CPU again?" Thanks Qiang > >Thanks, > > tglx
回复: [PATCH] io-wq: set task TASK_INTERRUPTIBLE state before schedule_timeout
发件人: Jens Axboe 发送时间: 2020年10月27日 21:35 收件人: Zhang, Qiang 抄送: io-ur...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] io-wq: set task TASK_INTERRUPTIBLE state before schedule_timeout On 10/26/20 9:09 PM, qiang.zh...@windriver.com wrote: > From: Zqiang > > In 'io_wqe_worker' thread, if the work which in 'wqe->work_list' be > finished, the 'wqe->work_list' is empty, and after that the > '__io_worker_idle' func return false, the task state is TASK_RUNNING, > need to be set TASK_INTERRUPTIBLE before call schedule_timeout func. > >I don't think that's safe - what if someone added work right before you >call schedule_timeout_interruptible? Something ala: > > >io_wq_enqueue() >set_current_state(TASK_INTERRUPTIBLE(); >schedule_timeout(WORKER_IDLE_TIMEOUT); > >then we'll have work added and the task state set to running, but the >worker itself just sets us to non-running and will hence wait >WORKER_IDLE_TIMEOUT before the work is processed. > >The current situation will do one extra loop for this case, as the >schedule_timeout() just ends up being a nop and we go around again although the worker task state is running, due to the call schedule_timeout, the current worker still possible to be switched out. if set current worker task is no-running, the current worker be switched out, but the schedule will call io_wq_worker_sleeping func to wake up free worker task, if wqe->free_list is not empty. >checking for work. Since we already unused the mm, the next iteration >will go to sleep properly unless new work came in. > >-- >Jens Axboe
回复: Question on io-wq
发件人: Zhang, Qiang 发送时间: 2020年10月23日 11:55 收件人: Jens Axboe 抄送: v...@zeniv.linux.org.uk; io-ur...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-fsde...@vger.kernel.org 主题: 回复: Question on io-wq 发件人: Jens Axboe 发送时间: 2020年10月22日 22:08 收件人: Zhang, Qiang 抄送: v...@zeniv.linux.org.uk; io-ur...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-fsde...@vger.kernel.org 主题: Re: Question on io-wq On 10/22/20 3:02 AM, Zhang,Qiang wrote: > > Hi Jens Axboe > > There are some problem in 'io_wqe_worker' thread, when the > 'io_wqe_worker' be create and Setting the affinity of CPUs in NUMA > nodes, due to CPU hotplug, When the last CPU going down, the > 'io_wqe_worker' thread will run anywhere. when the CPU in the node goes > online again, we should restore their cpu bindings? >Something like the below should help in ensuring affinities are >always correct - trigger an affinity set for an online CPU event. We >should not need to do it for offlining. Can you test it? >diff --git a/fs/io-wq.c b/fs/io-wq.c >index 4012ff541b7b..3bf029d1170e 100644 >--- a/fs/io-wq.c >+++ b/fs/io-wq.c >@@ -19,6 +19,7 @@ >#include >#include >#include >+#include >#include "io-wq.h" > >@@ -123,9 +124,13 @@ struct io_wq { > refcount_t refs; > struct completion done; > >+ struct hlist_node cpuhp_node; >+ > refcount_t use_refs; >}; > >+static enum cpuhp_state io_wq_online; >+ >static bool io_worker_get(struct io_worker *worker) >{ > return refcount_inc_not_zero(>ref); >@@ -1096,6 +1101,13 @@ struct io_wq *io_wq_create(unsigned bounded, >struct >io_wq_data *data) > return ERR_PTR(-ENOMEM); > } > >+ ret = cpuhp_state_add_instance_nocalls(io_wq_online, >>cpuhp_node); >+ if (ret) { >+ kfree(wq->wqes); >+ kfree(wq); >+ return ERR_PTR(ret); >+ } >+ >wq->free_work = data->free_work; >wq->do_work = data->do_work; > >@@ -1145,6 +1157,7 @@ struct io_wq *io_wq_create(unsigned bounded, >struct >io_wq_data *data) > ret = PTR_ERR(wq->manager); > complete(>done); >err: >+ cpuhp_state_remove_instance_nocalls(io_wq_online, >>cpuhp_node); > for_each_node(node) > kfree(wq->wqes[node]); > kfree(wq->wqes); >@@ -1164,6 +1177,8 @@ static void __io_wq_destroy(struct io_wq *wq) >{ > int node; > >+ cpuhp_state_remove_instance_nocalls(io_wq_online, >>cpuhp_node); >+ > set_bit(IO_WQ_BIT_EXIT, >state); > if (wq->manager) > kthread_stop(wq->manager); >@@ -1191,3 +1206,40 @@ struct task_struct *io_wq_get_task(struct io_wq >*wq) >{ > return wq->manager; >} >+ >+static bool io_wq_worker_affinity(struct io_worker *worker, void *data) >+{ >+ struct task_struct *task = worker->task; >+ unsigned long flags; >+ struct rq_flags rf; struct rq *rq; rq = task_rq_lock(task, ); --- raw_spin_lock_irqsave(>pi_lock, flags); >+ do_set_cpus_allowed(task, cpumask_of_node(worker->wqe->node)); >+ task->flags |= PF_NO_SETAFFINITY; --- raw_spin_unlock_irqrestore(>pi_lock, flags); task_rq_unlock(rq, task, ); >+ return false; >+} >+ >+static int io_wq_cpu_online(unsigned int cpu, struct hlist_node *node) >+{ >+ struct io_wq *wq = hlist_entry_safe(node, struct io_wq, cpuhp_node); >+ int i; >+ >+ rcu_read_lock(); >+ for_each_node(i) >+ io_wq_for_each_worker(wq->wqes[i], io_wq_worker_affinity, >>NULL); >+ rcu_read_unlock(); >+ return 0; >+} >+ >+static __init int io_wq_init(void) >+{ >+ int ret; >+ >+ ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, >"io->wq/online", >+ io_wq_cpu_online, NULL); >+ if (ret < 0) >+ return ret; >+ io_wq_online = ret; >+ return 0; >+} >+subsys_initcall(io_wq_init); > >-- >Jens Axboe
回复: Question on io-wq
发件人: Jens Axboe 发送时间: 2020年10月22日 22:08 收件人: Zhang, Qiang 抄送: v...@zeniv.linux.org.uk; io-ur...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-fsde...@vger.kernel.org 主题: Re: Question on io-wq On 10/22/20 3:02 AM, Zhang,Qiang wrote: > > Hi Jens Axboe > > There are some problem in 'io_wqe_worker' thread, when the > 'io_wqe_worker' be create and Setting the affinity of CPUs in NUMA > nodes, due to CPU hotplug, When the last CPU going down, the > 'io_wqe_worker' thread will run anywhere. when the CPU in the node goes > online again, we should restore their cpu bindings? >Something like the below should help in ensuring affinities are >always correct - trigger an affinity set for an online CPU event. We >should not need to do it for offlining. Can you test it? >diff --git a/fs/io-wq.c b/fs/io-wq.c >index 4012ff541b7b..3bf029d1170e 100644 >--- a/fs/io-wq.c >+++ b/fs/io-wq.c >@@ -19,6 +19,7 @@ >#include >#include >#include >+#include >#include "io-wq.h" > >@@ -123,9 +124,13 @@ struct io_wq { > refcount_t refs; > struct completion done; > >+ struct hlist_node cpuhp_node; >+ > refcount_t use_refs; >}; > >+static enum cpuhp_state io_wq_online; >+ >static bool io_worker_get(struct io_worker *worker) >{ > return refcount_inc_not_zero(>ref); >@@ -1096,6 +1101,13 @@ struct io_wq *io_wq_create(unsigned bounded, >struct >io_wq_data *data) > return ERR_PTR(-ENOMEM); > } > >+ ret = cpuhp_state_add_instance_nocalls(io_wq_online, >>cpuhp_node); >+ if (ret) { >+ kfree(wq->wqes); >+ kfree(wq); >+ return ERR_PTR(ret); >+ } >+ >wq->free_work = data->free_work; >wq->do_work = data->do_work; > >@@ -1145,6 +1157,7 @@ struct io_wq *io_wq_create(unsigned bounded, >struct >io_wq_data *data) > ret = PTR_ERR(wq->manager); > complete(>done); >err: >+ cpuhp_state_remove_instance_nocalls(io_wq_online, >>cpuhp_node); > for_each_node(node) > kfree(wq->wqes[node]); > kfree(wq->wqes); >@@ -1164,6 +1177,8 @@ static void __io_wq_destroy(struct io_wq *wq) >{ > int node; > >+ cpuhp_state_remove_instance_nocalls(io_wq_online, >>cpuhp_node); >+ > set_bit(IO_WQ_BIT_EXIT, >state); > if (wq->manager) > kthread_stop(wq->manager); >@@ -1191,3 +1206,40 @@ struct task_struct *io_wq_get_task(struct io_wq >*wq) >{ > return wq->manager; >} >+ >+static bool io_wq_worker_affinity(struct io_worker *worker, void *data) >+{ >+ struct task_struct *task = worker->task; >+ unsigned long flags; >+ struct rq_flags rf; >+ raw_spin_lock_irqsave(>pi_lock, flags); >+ do_set_cpus_allowed(task, cpumask_of_node(worker->wqe->node)); >+ task->flags |= PF_NO_SETAFFINITY; >+ raw_spin_unlock_irqrestore(>pi_lock, flags); >+ return false; >+} >+ >+static int io_wq_cpu_online(unsigned int cpu, struct hlist_node *node) >+{ >+ struct io_wq *wq = hlist_entry_safe(node, struct io_wq, cpuhp_node); >+ int i; >+ >+ rcu_read_lock(); >+ for_each_node(i) >+ io_wq_for_each_worker(wq->wqes[i], io_wq_worker_affinity, >>NULL); >+ rcu_read_unlock(); >+ return 0; >+} >+ >+static __init int io_wq_init(void) >+{ >+ int ret; >+ >+ ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, >"io->wq/online", >+ io_wq_cpu_online, NULL); >+ if (ret < 0) >+ return ret; >+ io_wq_online = ret; >+ return 0; >+} >+subsys_initcall(io_wq_init); > >-- >Jens Axboe
Question on io-wq
Hi Jens Axboe There are some problem in 'io_wqe_worker' thread, when the 'io_wqe_worker' be create and Setting the affinity of CPUs in NUMA nodes, due to CPU hotplug, When the last CPU going down, the 'io_wqe_worker' thread will run anywhere. when the CPU in the node goes online again, we should restore their cpu bindings? Thanks Qiang
Question on io-wq
Hi Jens Axboe There are some problem in 'io_wqe_worker' thread, when the 'io_wqe_worker' be create and Setting the affinity of CPUs in NUMA nodes, due to CPU hotplug, When the last CPU going down, the 'io_wqe_worker' thread will run anywhere. when the CPU in the node goes online again, we should restore their cpu bindings? Thanks Qiang
回复: 回复: [PATCH] btrfs: Fix missing close devices
发件人: Johannes Thumshirn 发送时间: 2020年9月21日 17:17 收件人: Zhang, Qiang; c...@fb.com; jo...@toxicpanda.com; dste...@suse.com; syzbot+582e66e5edf36a22c...@syzkaller.appspotmail.com 抄送: linux-bt...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: 回复: [PATCH] btrfs: Fix missing close devices On 21/09/2020 11:14, Zhang, Qiang wrote: > > > > 发件人: Johannes Thumshirn > 发送时间: 2020年9月21日 16:52 > 收件人: Zhang, Qiang; c...@fb.com; jo...@toxicpanda.com; dste...@suse.com > 抄送: linux-bt...@vger.kernel.org; linux-kernel@vger.kernel.org > 主题: Re: [PATCH] btrfs: Fix missing close devices > > On 21/09/2020 10:27, qiang.zh...@windriver.com wrote: >> From: Zqiang >> >> When the btrfs fill super error, we should first close devices and >> then call deactivate_locked_super func to free fs_info. >> >> Signed-off-by: Zqiang >> --- >> fs/btrfs/super.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c >> index 8840a4fa81eb..3bfd54e8f388 100644 >> --- a/fs/btrfs/super.c >> +++ b/fs/btrfs/super.c >> @@ -1675,6 +1675,7 @@ static struct dentry *btrfs_mount_root(struct >> file_system_type *fs_type, >> error = security_sb_set_mnt_opts(s, new_sec_opts, 0, NULL); >> security_free_mnt_opts(_sec_opts); >> if (error) { >> + btrfs_close_devices(fs_devices); >> deactivate_locked_super(s); >> return ERR_PTR(error); >> } >> > >> I think this is the fix for the syzkaller issue: >> Reported-by: syzbot+582e66e5edf36a22c...@syzkaller.appspotmail.com > > Please try this patch. > >Nope, with this patch I get the following Null-ptr-deref: >[ 39.065209] >>== >[ 39.066318] BUG: KASAN: null-ptr-deref in bdev_name.constprop.0+0xd4/0x240 >[ 39.067307] Read of size 4 at addr 03ac by task syz-repro/273 >[ 39.068289] >[ 39.069602] >>== >[ 39.070837] BUG: kernel NULL pointer dereference, address: 03ac >[ 39.071837] #PF: supervisor read access in kernel mode >[ 39.072580] #PF: error_code(0x) - not-present page >[ 39.073318] PGD 8001cd3b1067 P4D 8001cd3b1067 PUD 1c6de7067 PMD >0 >[ 39.074306] Oops: [#1] SMP KASAN PTI >[ 39.074887] CPU: 0 PID: 273 Comm: syz-repro Tainted: GB >5.9.0-rc5+ >#772 >[ 39.076031] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 >[ 39.077638] RIP: 0010:bdev_name.constprop.0+0xd4/0x240 >[ 39.078387] Code: ca 4c 89 4c 24 08 e8 0b e9 ff ff 48 89 df 49 89 c6 e8 40 >42 c6 ff 49 >8b ac 24 e0 00 00 00 48 8d bd ac 03 00 00 e8 2c 41 c6 ff <8b> 85 >ac 03 00 00 4c 8b 4c >24 08 85 c0 0f 84 fe 00 00 00 4c 89 cf >[ 39.080991] RSP: 0018:8881f1a97878 EFLAGS: 00010286 >[ 39.081728] RAX: 0001 RBX: 8881c9fb80e0 RCX: >>dc00 >[ 39.082725] RDX: 0007 RSI: 0004 RDI: >81acd784 >[ 39.083717] RBP: R08: R09: >>> >[ 39.084722] R10: fbfff0539591 R11: 0001 R12: >8881c9fb8000 >[ 39.085711] R13: 8881ef6e2698 R14: 8881ef6e2680 R15: > >[ 39.086704] FS: 7f5d36eb9540() GS:8881f760() >>knlGS: >[ 39.087827] CS: 0010 DS: ES: CR0: 80050033 >[ 39.088623] CR2: 03ac CR3: 0001ef552000 CR4: >06b0 >[ 39.089607] DR0: DR1: DR2: > >[ 39.090603] DR3: DR6: fffe0ff0 DR7: >0400 >[ 39.091583] Call Trace: >[ 39.091943] ? mac_address_string+0x380/0x380 >[ 39.092559] ? mark_held_locks+0x65/0x90 >[ 39.093116] pointer+0x21c/0x650 >[ 39.093578] ? format_decode+0x1cf/0x4e0 >[ 39.094139] ? resource_string.isra.0+0xc10/0xc10 >[ 39.094809] vsnprintf+0x2e0/0x820 >[ 39.095292] ? pointer+0x650/0x650 >[ 39.095785] snprintf+0x88/0xa0 >[ 39.096234] ? vsprintf+0x10/0x10 >[ 39.096708] ? rcu_read_lock_sched_held+0x3a/0x70 >[ 39.097378] ? sget+0x200/0x240 >[ 39.097908] ? btrfs_kill_super+0x30/0x30 [btrfs] >[ 39.098644] btrfs_mount_root+0x442/0x5d0 [btrfs] >[ 39.099377] ? parse_rescue_options+0x150/0x150 [btrfs] >[ 39.100103] ? rcu_read_lock_sched_held+0x3a/0x70 >[ 39.100759] ? vfs_parse_fs_string+0xbc/0xf0 >[ 39.10
回复: [PATCH] btrfs: Fix missing close devices
发件人: Johannes Thumshirn 发送时间: 2020年9月21日 16:52 收件人: Zhang, Qiang; c...@fb.com; jo...@toxicpanda.com; dste...@suse.com 抄送: linux-bt...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] btrfs: Fix missing close devices On 21/09/2020 10:27, qiang.zh...@windriver.com wrote: > From: Zqiang > > When the btrfs fill super error, we should first close devices and > then call deactivate_locked_super func to free fs_info. > > Signed-off-by: Zqiang > --- > fs/btrfs/super.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > index 8840a4fa81eb..3bfd54e8f388 100644 > --- a/fs/btrfs/super.c > +++ b/fs/btrfs/super.c > @@ -1675,6 +1675,7 @@ static struct dentry *btrfs_mount_root(struct > file_system_type *fs_type, > error = security_sb_set_mnt_opts(s, new_sec_opts, 0, NULL); > security_free_mnt_opts(_sec_opts); > if (error) { > + btrfs_close_devices(fs_devices); > deactivate_locked_super(s); > return ERR_PTR(error); > } > >I think this is the fix for the syzkaller issue: >Reported-by: syzbot+582e66e5edf36a22c...@syzkaller.appspotmail.com Please try this patch.
Re: 回复: RCU: Question on force_qs_rnp
On 9/16/20 2:06 AM, Paul E. McKenney wrote: On Tue, Sep 15, 2020 at 01:16:39PM +0800, Zhang,Qiang wrote: On 9/15/20 11:41 AM, Paul E. McKenney wrote: On Tue, Sep 15, 2020 at 03:18:23AM +, Zhang, Qiang wrote: 发件人: Paul E. McKenney 发送时间: 2020年9月15日 4:56 收件人: Joel Fernandes 抄送: Zhang, Qiang; Uladzislau Rezki; j...@joshtriplett.org; rost...@goodmis.org; mathieu.desnoy...@efficios.com; Lai Jiangshan; r...@vger.kernel.org; LKML 主题: Re: RCU: Question on force_qs_rnp On Mon, Sep 14, 2020 at 03:42:08PM -0400, Joel Fernandes wrote: On Mon, Sep 14, 2020 at 07:55:18AM +, Zhang, Qiang wrote: Hello Paul I have some questions for you . in force_qs_rnp func , if "f(rdp)" func return true we will call rcu_report_qs_rnp func report a quiescent state for this rnp node, and clear grpmask form rnp->qsmask. after that , can we make a check for this rnp->qsmask, if rnp->qsmask == 0, we will check blocked readers in this rnp node, instead of jumping directly to the next node . Could you clarify what good is this going to do? What problem are you trying to address? You could have a task that is blocked in an RCU leaf node, but the force_qs_rnp() decided to call rcu_report_qs_rnp(). This is perfectly Ok. The CPU could be dyntick-idle and a quiescent state is reported. However, the GP must not end and the rcu leaf node should still be present in its parent intermediate nodes ->qsmask. In this case, the ->qsmask == 0 does not have any relevance. Or am I missing the point of the question? Hello, Qiang, Another way of making Joel's point is to say that the additional check you are asking for is already being done, but by rcu_report_qs_rnp(). Thanx, Paul Hello Pual, Joel What I want to express is as follows : diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 7623128d0020..beb554539f01 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2622,6 +2622,11 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp)) if (mask != 0) { /* Idle/offline CPUs, report (releases rnp->lock). */ rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); + raw_spin_lock_irqsave_rcu_node(rnp, flags); + if (rnp->qsmask == 0 && rcu_preempt_blocked_readers_cgp(rnp)) + rcu_initiate_boost(rnp, flags); + else + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } else { /* Nothing to do here, so just drop the lock. */ raw_spin_unlock_irqrestore_rcu_node(rnp, flags); But in that case, why duplicate the code from rcu_initiate_boost()? Thanx, Paul Hello Paul When we force a qs for rnp, we first check the leaf node "rnp->qsmask" if it is reached zero, will check if there are some blocked readers in this leaf rnp node, if so we need to priority-boost blocked readers. if not we will check cpu dyntick-idle and report leaf node qs, after this leaf rnp node report qs, there is may be some blocked readers in this node, should we also need to priority-boost blocked readers? Yes, but we will do that on the next time around, a few milliseconds later. And by that time, it is quite possible that the reader will have completed, which will save us from having to priority-boost it. Thanx, Paul Thanks Paul, I see.
回复: 回复: [PATCH v3] debugobjects: install CPU hotplug callback
发件人: Waiman Long 发送时间: 2020年9月10日 10:50 收件人: Zhang, Qiang; t...@linutronix.de; mi...@kernel.org; el...@google.com 抄送: linux-kernel@vger.kernel.org 主题: Re: 回复: [PATCH v3] debugobjects: install CPU hotplug callback On 9/9/20 9:48 PM, Zhang, Qiang wrote: > > > 发件人: Waiman Long > 发送时间: 2020年9月9日 2:23 > 收件人: Zhang, Qiang; t...@linutronix.de; mi...@kernel.org; el...@google.com > 抄送: linux-kernel@vger.kernel.org > 主题: Re: [PATCH v3] debugobjects: install CPU hotplug callback > > On 9/8/20 2:27 AM, qiang.zh...@windriver.com wrote: >> From: Zqiang >> >> Due to CPU hotplug, it may never be online after it's offline, >> some objects in percpu pool is never free. in order to avoid >> this happening, install CPU hotplug callback, call this callback >> func to free objects in percpu pool when CPU going offline. >> >> Signed-off-by: Zqiang >> --- >>v1->v2: >>Modify submission information. >> >>v2->v3: >>In CPU hotplug callback func, add clear percpu pool "obj_free" operation. >>capitalize 'CPU', and use shorter preprocessor sequence. >> >>include/linux/cpuhotplug.h | 1 + >>lib/debugobjects.c | 24 >>2 files changed, 25 insertions(+) >> >> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h >> index 3215023d4852..0c39d57e5342 100644 >> --- a/include/linux/cpuhotplug.h >> +++ b/include/linux/cpuhotplug.h >> @@ -36,6 +36,7 @@ enum cpuhp_state { >>CPUHP_X86_MCE_DEAD, >>CPUHP_VIRT_NET_DEAD, >>CPUHP_SLUB_DEAD, >> + CPUHP_DEBUG_OBJ_DEAD, >>CPUHP_MM_WRITEBACK_DEAD, >>CPUHP_MM_VMSTAT_DEAD, >>CPUHP_SOFTIRQ_DEAD, >> diff --git a/lib/debugobjects.c b/lib/debugobjects.c >> index fe4557955d97..bb69a02c3e7b 100644 >> --- a/lib/debugobjects.c >> +++ b/lib/debugobjects.c >> @@ -19,6 +19,7 @@ >>#include >>#include >>#include >> +#include >> >>#define ODEBUG_HASH_BITS14 >>#define ODEBUG_HASH_SIZE(1 << ODEBUG_HASH_BITS) >> @@ -433,6 +434,24 @@ static void free_object(struct debug_obj *obj) >>} >>} >> >> +#ifdef CONFIG_HOTPLUG_CPU >> +static int object_cpu_offline(unsigned int cpu) >> +{ >> + struct debug_percpu_free *percpu_pool; >> + struct hlist_node *tmp; >> + struct debug_obj *obj; >> + >> + percpu_pool = per_cpu_ptr(_obj_pool, cpu); >> + hlist_for_each_entry_safe(obj, tmp, _pool->free_objs, node) { >> + hlist_del(>node); >> + kmem_cache_free(obj_cache, obj); >> + } >> + percpu_pool->obj_free = 0; >>> For pointer, it is better to use NULL for clarity. >>> Cheers, >>> Longman > Do you mean "->obj_free" variable ? this represents the number of free > objects in percpu_pool . > >>You are right. I got confused. Sorry for the noise. >>Cheers, >>Longman Hello tglx, mingo Is this patch acceptable? Thanks Qiang
Re: 回复: RCU: Question on force_qs_rnp
On 9/15/20 11:41 AM, Paul E. McKenney wrote: On Tue, Sep 15, 2020 at 03:18:23AM +, Zhang, Qiang wrote: 发件人: Paul E. McKenney 发送时间: 2020年9月15日 4:56 收件人: Joel Fernandes 抄送: Zhang, Qiang; Uladzislau Rezki; j...@joshtriplett.org; rost...@goodmis.org; mathieu.desnoy...@efficios.com; Lai Jiangshan; r...@vger.kernel.org; LKML 主题: Re: RCU: Question on force_qs_rnp On Mon, Sep 14, 2020 at 03:42:08PM -0400, Joel Fernandes wrote: On Mon, Sep 14, 2020 at 07:55:18AM +, Zhang, Qiang wrote: Hello Paul I have some questions for you . in force_qs_rnp func , if "f(rdp)" func return true we will call rcu_report_qs_rnp func report a quiescent state for this rnp node, and clear grpmask form rnp->qsmask. after that , can we make a check for this rnp->qsmask, if rnp->qsmask == 0, we will check blocked readers in this rnp node, instead of jumping directly to the next node . Could you clarify what good is this going to do? What problem are you trying to address? You could have a task that is blocked in an RCU leaf node, but the force_qs_rnp() decided to call rcu_report_qs_rnp(). This is perfectly Ok. The CPU could be dyntick-idle and a quiescent state is reported. However, the GP must not end and the rcu leaf node should still be present in its parent intermediate nodes ->qsmask. In this case, the ->qsmask == 0 does not have any relevance. Or am I missing the point of the question? Hello, Qiang, Another way of making Joel's point is to say that the additional check you are asking for is already being done, but by rcu_report_qs_rnp(). Thanx, Paul Hello Pual, Joel What I want to express is as follows : diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 7623128d0020..beb554539f01 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2622,6 +2622,11 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp)) if (mask != 0) { /* Idle/offline CPUs, report (releases rnp->lock). */ rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); + raw_spin_lock_irqsave_rcu_node(rnp, flags); + if (rnp->qsmask == 0 && rcu_preempt_blocked_readers_cgp(rnp)) + rcu_initiate_boost(rnp, flags); + else + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } else { /* Nothing to do here, so just drop the lock. */ raw_spin_unlock_irqrestore_rcu_node(rnp, flags); But in that case, why duplicate the code from rcu_initiate_boost()? Thanx, Paul Hello Paul When we force a qs for rnp, we first check the leaf node "rnp->qsmask" if it is reached zero, will check if there are some blocked readers in this leaf rnp node, if so we need to priority-boost blocked readers. if not we will check cpu dyntick-idle and report leaf node qs, after this leaf rnp node report qs, there is may be some blocked readers in this node, should we also need to priority-boost blocked readers? Thanks Qiang
回复: RCU: Question on force_qs_rnp
发件人: Paul E. McKenney 发送时间: 2020年9月15日 4:56 收件人: Joel Fernandes 抄送: Zhang, Qiang; Uladzislau Rezki; j...@joshtriplett.org; rost...@goodmis.org; mathieu.desnoy...@efficios.com; Lai Jiangshan; r...@vger.kernel.org; LKML 主题: Re: RCU: Question on force_qs_rnp On Mon, Sep 14, 2020 at 03:42:08PM -0400, Joel Fernandes wrote: > On Mon, Sep 14, 2020 at 07:55:18AM +0000, Zhang, Qiang wrote: > > Hello Paul > > > > I have some questions for you . > > in force_qs_rnp func , if "f(rdp)" func return true we will call > > rcu_report_qs_rnp func > > report a quiescent state for this rnp node, and clear grpmask form > > rnp->qsmask. > > after that , can we make a check for this rnp->qsmask, if rnp->qsmask == > > 0, > > we will check blocked readers in this rnp node, instead of jumping > > directly to the next node . > > Could you clarify what good is this going to do? What problem are you trying > to > address? > > You could have a task that is blocked in an RCU leaf node, but the > force_qs_rnp() decided to call rcu_report_qs_rnp(). This is perfectly Ok. The > CPU could be dyntick-idle and a quiescent state is reported. However, the GP > must not end and the rcu leaf node should still be present in its parent > intermediate nodes ->qsmask. In this case, the ->qsmask == 0 does not have > any relevance. > > Or am I missing the point of the question? >Hello, Qiang, >Another way of making Joel's point is to say that the additional check >you are asking for is already being done, but by rcu_report_qs_rnp(). >Thanx, Paul Hello Pual, Joel What I want to express is as follows : diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 7623128d0020..beb554539f01 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2622,6 +2622,11 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp)) if (mask != 0) { /* Idle/offline CPUs, report (releases rnp->lock). */ rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); + raw_spin_lock_irqsave_rcu_node(rnp, flags); + if (rnp->qsmask == 0 && rcu_preempt_blocked_readers_cgp(rnp)) + rcu_initiate_boost(rnp, flags); + else + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } else { /* Nothing to do here, so just drop the lock. */ raw_spin_unlock_irqrestore_rcu_node(rnp, flags); Thanks Qiang
RCU: Question on force_qs_rnp
Hello Paul I have some questions for you . in force_qs_rnp func , if "f(rdp)" func return true we will call rcu_report_qs_rnp func report a quiescent state for this rnp node, and clear grpmask form rnp->qsmask. after that , can we make a check for this rnp->qsmask, if rnp->qsmask == 0, we will check blocked readers in this rnp node, instead of jumping directly to the next node . Thanks Qiang
回复: RCU: Question rcu_preempt_blocked_readers_cgp in rcu_gp_fqs_loop func
发件人: Paul E. McKenney 发送时间: 2020年9月9日 19:22 收件人: Zhang, Qiang 抄送: Joel Fernandes; Uladzislau Rezki; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML 主题: Re: RCU: Question rcu_preempt_blocked_readers_cgp in rcu_gp_fqs_loop func On Wed, Sep 09, 2020 at 07:03:39AM +, Zhang, Qiang wrote: > > When config preempt RCU, and then there are multiple levels node, the > current task is preempted in rcu read critical region. > the current task be add to "rnp->blkd_tasks" link list, and the > "rnp->gp_tasks" may be assigned a value . these rnp is leaf node in RCU > tree. > > But in "rcu_gp_fqs_loop" func, we check blocked readers in root node. > > static void rcu_gp_fqs_loop(void) > { > . > struct rcu_node *rnp = rcu_get_root(); > . > if (!READ_ONCE(rnp->qsmask) && >!rcu_preempt_blocked_readers_cgp(rnp)) > --> rnp is root node > break; > > } > > the root node's blkd_tasks never add task, the "rnp->gp_tasks" is never be > assigned value, this check is invailed. > Should we check leaf nodes like this >There are two cases: >1. There is only a single rcu_node structure, which is both root > and leaf. In this case, the current check is required: Both > ->qsmask and the ->blkd_tasks list must be checked. Your >rcu_preempt_blocked_readers() would work in this case, but >the current code is a bit faster because it does not need >to acquire the ->lock nor does it need the loop overhead. >2. There are multiple levels. In this case, as you say, the root >rcu_node structure's ->blkd_tasks list will always be empty. >But also in this case, the root rcu_node structure's ->qsmask >cannot be zero until all the leaf rcu_node structures' ->qsmask >fields are zero and their ->blkd_tasks lists no longer have >tasks blocking the current grace period. This means that your > rcu_preempt_blocked_readers() function would never return > true in this case. >So the current code is fine. >Are you seeing failures on mainline kernels? If so, what is the failure >mode? Yes it's right, thank you for your explanation. thanks Qiang > Thanx, Paul > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1846,6 +1846,25 @@ static bool rcu_gp_init(void) > return true; > } > > +static bool rcu_preempt_blocked_readers(void) > +{ > + struct rcu_node *rnp; > + unsigned long flags; > + bool ret = false; > + > + rcu_for_each_leaf_node(rnp) { > + raw_spin_lock_irqsave_rcu_node(rnp, flags); > + if (rcu_preempt_blocked_readers_cgp(rnp)) { > + ret = true; > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > + break; > + } > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > + } > + > + return ret; > +} > + > /* > * Helper function for swait_event_idle_exclusive() wakeup at > force-quiescent-state > * time. > @@ -1864,7 +1883,7 @@ static bool rcu_gp_fqs_check_wake(int *gfp) > return true; > > // The current grace period has completed. > - if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers_cgp(rnp)) > + if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers()) > return true; > > return false; > @@ -1927,7 +1946,7 @@ static void rcu_gp_fqs_loop(void) > /* Locking provides needed memory barriers. */ > /* If grace period done, leave loop. */ > if (!READ_ONCE(rnp->qsmask) && > - !rcu_preempt_blocked_readers_cgp(rnp)) > + !rcu_preempt_blocked_readers()) > break; > /* If time for quiescent-state forcing, do it. */ > if (!time_after(rcu_state.jiffies_force_qs, jiffies) || > -- > > > thanks > Qiang
回复: [PATCH v3] debugobjects: install CPU hotplug callback
发件人: Waiman Long 发送时间: 2020年9月9日 2:23 收件人: Zhang, Qiang; t...@linutronix.de; mi...@kernel.org; el...@google.com 抄送: linux-kernel@vger.kernel.org 主题: Re: [PATCH v3] debugobjects: install CPU hotplug callback On 9/8/20 2:27 AM, qiang.zh...@windriver.com wrote: > From: Zqiang > > Due to CPU hotplug, it may never be online after it's offline, > some objects in percpu pool is never free. in order to avoid > this happening, install CPU hotplug callback, call this callback > func to free objects in percpu pool when CPU going offline. > > Signed-off-by: Zqiang > --- > v1->v2: > Modify submission information. > > v2->v3: > In CPU hotplug callback func, add clear percpu pool "obj_free" operation. > capitalize 'CPU', and use shorter preprocessor sequence. > > include/linux/cpuhotplug.h | 1 + > lib/debugobjects.c | 24 > 2 files changed, 25 insertions(+) > > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h > index 3215023d4852..0c39d57e5342 100644 > --- a/include/linux/cpuhotplug.h > +++ b/include/linux/cpuhotplug.h > @@ -36,6 +36,7 @@ enum cpuhp_state { > CPUHP_X86_MCE_DEAD, > CPUHP_VIRT_NET_DEAD, > CPUHP_SLUB_DEAD, > + CPUHP_DEBUG_OBJ_DEAD, > CPUHP_MM_WRITEBACK_DEAD, > CPUHP_MM_VMSTAT_DEAD, > CPUHP_SOFTIRQ_DEAD, > diff --git a/lib/debugobjects.c b/lib/debugobjects.c > index fe4557955d97..bb69a02c3e7b 100644 > --- a/lib/debugobjects.c > +++ b/lib/debugobjects.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > > #define ODEBUG_HASH_BITS14 > #define ODEBUG_HASH_SIZE(1 << ODEBUG_HASH_BITS) > @@ -433,6 +434,24 @@ static void free_object(struct debug_obj *obj) > } > } > > +#ifdef CONFIG_HOTPLUG_CPU > +static int object_cpu_offline(unsigned int cpu) > +{ > + struct debug_percpu_free *percpu_pool; > + struct hlist_node *tmp; > + struct debug_obj *obj; > + > + percpu_pool = per_cpu_ptr(_obj_pool, cpu); > + hlist_for_each_entry_safe(obj, tmp, _pool->free_objs, node) { > + hlist_del(>node); > + kmem_cache_free(obj_cache, obj); > + } > + percpu_pool->obj_free = 0; >>For pointer, it is better to use NULL for clarity. >>Cheers, >>Longman Do you mean "->obj_free" variable ? this represents the number of free objects in percpu_pool . > + > + return 0; > +} > +#endif > + > /* >* We run out of memory. That means we probably have tons of objects >* allocated. > @@ -1367,6 +1386,11 @@ void __init debug_objects_mem_init(void) > } else > debug_objects_selftest(); > > +#ifdef CONFIG_HOTPLUG_CPU > + cpuhp_setup_state_nocalls(CPUHP_DEBUG_OBJ_DEAD, "object:offline", NULL, > + object_cpu_offline); > +#endif > + > /* >* Increase the thresholds for allocating and freeing objects >* according to the number of possible CPUs available in the system.
RCU: Question rcu_preempt_blocked_readers_cgp in rcu_gp_fqs_loop func
When config preempt RCU, and then there are multiple levels node, the current task is preempted in rcu read critical region. the current task be add to "rnp->blkd_tasks" link list, and the "rnp->gp_tasks" may be assigned a value . these rnp is leaf node in RCU tree. But in "rcu_gp_fqs_loop" func, we check blocked readers in root node. static void rcu_gp_fqs_loop(void) { . struct rcu_node *rnp = rcu_get_root(); . if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers_cgp(rnp)) --> rnp is root node break; } the root node's blkd_tasks never add task, the "rnp->gp_tasks" is never be assigned value, this check is invailed. Should we check leaf nodes like this --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1846,6 +1846,25 @@ static bool rcu_gp_init(void) return true; } +static bool rcu_preempt_blocked_readers(void) +{ + struct rcu_node *rnp; + unsigned long flags; + bool ret = false; + + rcu_for_each_leaf_node(rnp) { + raw_spin_lock_irqsave_rcu_node(rnp, flags); + if (rcu_preempt_blocked_readers_cgp(rnp)) { + ret = true; + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); + break; + } + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); + } + + return ret; +} + /* * Helper function for swait_event_idle_exclusive() wakeup at force-quiescent-state * time. @@ -1864,7 +1883,7 @@ static bool rcu_gp_fqs_check_wake(int *gfp) return true; // The current grace period has completed. - if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers_cgp(rnp)) + if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers()) return true; return false; @@ -1927,7 +1946,7 @@ static void rcu_gp_fqs_loop(void) /* Locking provides needed memory barriers. */ /* If grace period done, leave loop. */ if (!READ_ONCE(rnp->qsmask) && - !rcu_preempt_blocked_readers_cgp(rnp)) + !rcu_preempt_blocked_readers()) break; /* If time for quiescent-state forcing, do it. */ if (!time_after(rcu_state.jiffies_force_qs, jiffies) || -- thanks Qiang
[no subject]
When config preempt RCU, if task switch happened in rcu read critical region. the current task be add to "rnp->blkd_tasks" link list, these rnp is leaf node in RCUtree. In "rcu_gp_fqs_loop" func, static void rcu_gp_fqs_loop(void) { struct rcu_node *rnp = rcu_get_root(); }
回复: [PATCH v2] debugobjects: install cpu hotplug callback
tglx please review. Thanks Qiang 发件人: linux-kernel-ow...@vger.kernel.org 代表 qiang.zh...@windriver.com 发送时间: 2020年8月27日 13:06 收件人: t...@linutronix.de; long...@redhat.com; el...@google.com 抄送: linux-kernel@vger.kernel.org 主题: [PATCH v2] debugobjects: install cpu hotplug callback From: Zqiang Due to cpu hotplug, it may never be online after it's offline, some objects in percpu pool is never free, in order to avoid this happening, install cpu hotplug callback, call this callback func to free objects in percpu pool when cpu going offline. Signed-off-by: Zqiang --- v1->v2: Modify submission information. include/linux/cpuhotplug.h | 1 + lib/debugobjects.c | 23 +++ 2 files changed, 24 insertions(+) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index a2710e654b64..2e77db655cfa 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -36,6 +36,7 @@ enum cpuhp_state { CPUHP_X86_MCE_DEAD, CPUHP_VIRT_NET_DEAD, CPUHP_SLUB_DEAD, + CPUHP_DEBUG_OBJ_DEAD, CPUHP_MM_WRITEBACK_DEAD, CPUHP_MM_VMSTAT_DEAD, CPUHP_SOFTIRQ_DEAD, diff --git a/lib/debugobjects.c b/lib/debugobjects.c index fe4557955d97..50e21ed0519e 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -19,6 +19,7 @@ #include #include #include +#include #define ODEBUG_HASH_BITS 14 #define ODEBUG_HASH_SIZE (1 << ODEBUG_HASH_BITS) @@ -433,6 +434,23 @@ static void free_object(struct debug_obj *obj) } } +#if defined(CONFIG_HOTPLUG_CPU) +static int object_cpu_offline(unsigned int cpu) +{ + struct debug_percpu_free *percpu_pool; + struct hlist_node *tmp; + struct debug_obj *obj; + + percpu_pool = per_cpu_ptr(_obj_pool, cpu); + hlist_for_each_entry_safe(obj, tmp, _pool->free_objs, node) { + hlist_del(>node); + kmem_cache_free(obj_cache, obj); + } + + return 0; +} +#endif + /* * We run out of memory. That means we probably have tons of objects * allocated. @@ -1367,6 +1385,11 @@ void __init debug_objects_mem_init(void) } else debug_objects_selftest(); +#if defined(CONFIG_HOTPLUG_CPU) + cpuhp_setup_state_nocalls(CPUHP_DEBUG_OBJ_DEAD, "object:offline", NULL, + object_cpu_offline); +#endif + /* * Increase the thresholds for allocating and freeing objects * according to the number of possible CPUs available in the system. -- 2.17.1
回复: 回复: [PATCH] debugobjects: install cpu hotplug callback
发件人: linux-kernel-ow...@vger.kernel.org 代表 Thomas Gleixner 发送时间: 2020年8月26日 7:53 收件人: Waiman Long; Zhang, Qiang; el...@google.com 抄送: linux-kernel@vger.kernel.org; a...@linux-foundation.org 主题: Re: 回复: [PATCH] debugobjects: install cpu hotplug callback On Tue, Aug 25 2020 at 18:26, Waiman Long wrote: > On 8/25/20 12:53 AM, Zhang, Qiang wrote: >> >> When a cpu going offline, we should free objects in "percpu_obj_pool" >> free_objs list which corresponding to this cpu. > > The percpu free object pool is supposed to be accessed only by that > particular cpu without any lock. Trying to access it from another cpu > can cause a race condition unless one can make sure that the offline cpu > won't become online in the mean time. >It is actually safe because CPU hotplug is globally serialized and there >is no way that an offline CPU will come back from death valley >magically. If such a zombie ever surfaces then we have surely more >serious problems than accessing that pool :) > There shouldn't be too many free objects in the percpu pool. Is it > worth the effort to free them? >That's a really good question nevertheless. The only case where this >ever matters is physical hotplug. All other CPU hotplug stuff is >temporarily or in case of a late (post boottime) SMT disable it's going >to be a handful of free objects on that pool. As debugobjects is as the >name says a debug facility the benefit is questionable unless there is a >good reason to do so. I don't know there may not be too many objects in the percpu pool, but that doesn't mean they no need to be free, a CPU may never be online after it is offline. some objects in percpu pool is never free. >Thanks, > tglx
回复: [PATCH] debugobjects: install cpu hotplug callback
发件人: linux-kernel-ow...@vger.kernel.org 代表 qiang.zh...@windriver.com 发送时间: 2020年8月20日 11:24 收件人: t...@linutronix.de; el...@google.com; long...@redhat.com 抄送: linux-kernel@vger.kernel.org 主题: [PATCH] debugobjects: install cpu hotplug callback From: Zqiang When a cpu going offline, we should free objects in "percpu_obj_pool" free_objs list which corresponding to this cpu. Signed-off-by: Zqiang --- include/linux/cpuhotplug.h | 1 + lib/debugobjects.c | 23 +++ 2 files changed, 24 insertions(+) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index a2710e654b64..2e77db655cfa 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -36,6 +36,7 @@ enum cpuhp_state { CPUHP_X86_MCE_DEAD, CPUHP_VIRT_NET_DEAD, CPUHP_SLUB_DEAD, + CPUHP_DEBUG_OBJ_DEAD, CPUHP_MM_WRITEBACK_DEAD, CPUHP_MM_VMSTAT_DEAD, CPUHP_SOFTIRQ_DEAD, diff --git a/lib/debugobjects.c b/lib/debugobjects.c index fe4557955d97..50e21ed0519e 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -19,6 +19,7 @@ #include #include #include +#include #define ODEBUG_HASH_BITS 14 #define ODEBUG_HASH_SIZE (1 << ODEBUG_HASH_BITS) @@ -433,6 +434,23 @@ static void free_object(struct debug_obj *obj) } } +#if defined(CONFIG_HOTPLUG_CPU) +static int object_cpu_offline(unsigned int cpu) +{ + struct debug_percpu_free *percpu_pool; + struct hlist_node *tmp; + struct debug_obj *obj; + + percpu_pool = per_cpu_ptr(_obj_pool, cpu); + hlist_for_each_entry_safe(obj, tmp, _pool->free_objs, node) { + hlist_del(>node); + kmem_cache_free(obj_cache, obj); + } + + return 0; +} +#endif + /* * We run out of memory. That means we probably have tons of objects * allocated. @@ -1367,6 +1385,11 @@ void __init debug_objects_mem_init(void) } else debug_objects_selftest(); +#if defined(CONFIG_HOTPLUG_CPU) + cpuhp_setup_state_nocalls(CPUHP_DEBUG_OBJ_DEAD, "object:offline", NULL, + object_cpu_offline); +#endif + /* * Increase the thresholds for allocating and freeing objects * according to the number of possible CPUs available in the system. -- 2.17.1
回复: [PATCH v2] libnvdimm: KASAN: global-out-of-bounds Read in internal_create_group
cc: Dan Williams Please review. 发件人: linux-kernel-ow...@vger.kernel.org 代表 qiang.zh...@windriver.com 发送时间: 2020年8月12日 16:55 收件人: dan.j.willi...@intel.com; vishal.l.ve...@intel.com; dave.ji...@intel.com; ira.we...@intel.com 抄送: linux-nvd...@lists.01.org; linux-kernel@vger.kernel.org 主题: [PATCH v2] libnvdimm: KASAN: global-out-of-bounds Read in internal_create_group From: Zqiang Because the last member of the "nvdimm_firmware_attributes" array was not assigned a null ptr, when traversal of "grp->attrs" array is out of bounds in "create_files" func. func: create_files: ->for (i = 0, attr = grp->attrs; *attr && !error; i++, attr++) -> BUG: KASAN: global-out-of-bounds in create_files fs/sysfs/group.c:43 [inline] BUG: KASAN: global-out-of-bounds in internal_create_group+0x9d8/0xb20 fs/sysfs/group.c:149 Read of size 8 at addr 8a2e4cf0 by task kworker/u17:10/959 CPU: 2 PID: 959 Comm: kworker/u17:10 Not tainted 5.8.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound async_run_entry_fn Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x18f/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0x5/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 create_files fs/sysfs/group.c:43 [inline] internal_create_group+0x9d8/0xb20 fs/sysfs/group.c:149 internal_create_groups.part.0+0x90/0x140 fs/sysfs/group.c:189 internal_create_groups fs/sysfs/group.c:185 [inline] sysfs_create_groups+0x25/0x50 fs/sysfs/group.c:215 device_add_groups drivers/base/core.c:2024 [inline] device_add_attrs drivers/base/core.c:2178 [inline] device_add+0x7fd/0x1c40 drivers/base/core.c:2881 nd_async_device_register+0x12/0x80 drivers/nvdimm/bus.c:506 async_run_entry_fn+0x121/0x530 kernel/async.c:123 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 The buggy address belongs to the variable: nvdimm_firmware_attributes+0x10/0x40 Reported-by: syzbot+1cf0ffe61aecf46f5...@syzkaller.appspotmail.com Signed-off-by: Zqiang --- v1->v2: Modify the description of the error. drivers/nvdimm/dimm_devs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c index 61374def5155..b59032e0859b 100644 --- a/drivers/nvdimm/dimm_devs.c +++ b/drivers/nvdimm/dimm_devs.c @@ -529,6 +529,7 @@ static DEVICE_ATTR_ADMIN_RW(activate); static struct attribute *nvdimm_firmware_attributes[] = { _attr_activate.attr, _attr_result.attr, + NULL, }; static umode_t nvdimm_firmware_visible(struct kobject *kobj, struct attribute *a, int n) -- 2.17.1
回复: [PATCH] rcu: shrink each possible cpu krcp
发件人: linux-kernel-ow...@vger.kernel.org 代表 Joel Fernandes 发送时间: 2020年8月19日 8:04 收件人: Paul E. McKenney 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML 主题: Re: [PATCH] rcu: shrink each possible cpu krcp On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney wrote: > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index b8ccd7b5af82..6decb9ad2421 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) > > { > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu); > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */ > > + struct kfree_rcu_cpu *krcp; > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) > > return 0; > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */ > > + krcp = this_cpu_ptr() > > + schedule_delayed_work(>monitor_work, 0); > > + > > > > A cpu can be offlined and its krp will be stuck until a shrinker is > > involved. > > Maybe be never. > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a > hard time getting too worried about it. ;-) >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync() >on the cache reaper who's job is to flush the per-cpu caches. So I >believe during CPU offlining, the per-cpu slab caches are flushed. > >thanks, > >- Joel When cpu going offline, the slub or slab only flush free objects in offline cpu cache, put these free objects in node list or return buddy system, for those who are still in use, they still stay offline cpu cache. If we want clean per-cpu "krcp" objects when cpu going offline. we should free "krcp" cache objects in "rcutree_offline_cpu", this func be called before other rcu cpu offline func. and then "rcutree_offline_cpu" will be called in "cpuhp/%u" per-cpu thread. diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 8ce77d9ac716..1812d4a1ac1b 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu) unsigned long flags; struct rcu_data *rdp; struct rcu_node *rnp; + struct kfree_rcu_cpu *krcp; rdp = per_cpu_ptr(_data, cpu); rnp = rdp->mynode; @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu) // nohz_full CPUs need the tick for stop-machine to work quickly tick_dep_set(TICK_DEP_BIT_RCU); + + krcp = per_cpu_ptr(, cpu); + raw_spin_lock_irqsave(>lock, flags); + schedule_delayed_work(>monitor_work, 0); + raw_spin_unlock_irqrestore(>lock, flags); return 0; } thanks, Zqiang
转发: upstream test error: WARNING in do_epoll_wait
> >发件人: linux-kernel-ow...@vger.kernel.org >>代表 syzbot >发送时间: 2020年8月5日 15:19 >收件人: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; >>syzkaller->b...@googlegroups.com; v...@zeniv.linux.org.uk >主题: upstream test error: WARNING in do_epoll_wait >Hello, >syzbot found the following issue on: >HEAD commit:4f30a60a Merge tag 'close-range-v5.9' of git://git.kernel... >git tree: upstream >console output: https://syzkaller.appspot.com/x/log.txt?x=14c5a7da90 >kernel config: https://syzkaller.appspot.com/x/.config?x=8bdd9944dedf0f16 >dashboard link: https://syzkaller.appspot.com/bug?extid=4429670d8213f5f26352 >compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project.git >>ca2dcbd030eadbf0aa9b660efe864ff08af6e18b) >IMPORTANT: if you fix the issue, please add the following tag to the commit: >Reported-by: syzbot+4429670d8213f5f26...@syzkaller.appspotmail.com >[ cut here ] >WARNING: CPU: 1 PID: 8728 at fs/eventpoll.c:1828 ep_poll fs/eventpoll.c:1828 >[inline] >WARNING: CPU: 1 PID: 8728 at fs/eventpoll.c:1828 do_epoll_wait+0x337/0x920 >>fs/eventpoll.c:2333 >Kernel panic - not syncing: panic_on_warn set ... >CPU: 1 PID: 8728 Comm: syz-fuzzer Not tainted 5.8.0-syzkaller #0 >Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS >>Google 01/01/2011 >Call Trace: >__dump_stack lib/dump_stack.c:77 [inline] >dump_stack+0x16e/0x25d lib/dump_stack.c:118 >panic+0x20c/0x69a kernel/panic.c:231 >__warn+0x211/0x240 kernel/panic.c:600 >report_bug+0x153/0x1d0 lib/bug.c:198 >handle_bug+0x4d/0x90 arch/x86/kernel/traps.c:235 >exc_invalid_op+0x16/0x70 arch/x86/kernel/traps.c:255 >asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:547 >RIP: 0010:ep_poll fs/eventpoll.c:1828 [inline] >RIP: 0010:do_epoll_wait+0x337/0x920 fs/eventpoll.c:2333 >Code: 41 be 01 00 00 00 31 c0 48 89 44 24 20 45 31 e4 e9 7f 01 00 00 e8 59 ab >c6 ff 41 bc f2 >ff ff ff e9 c8 03 00 00 e8 49 ab c6 ff <0f> 0b e9 58 fe ff ff >49 bf ff ff ff ff ff ff ff 7f e9 f0 fe ff >ff >RSP: 0018:c9e1fe28 EFLAGS: 00010293 >RAX: 81856297 RBX: 888120fafa00 RCX: 88811e196400 >RDX: RSI: RDI: >RBP: R08: 818560d8 R09: 88619eb7 >R10: R11: R12: 7000 >R13: 0080 R14: 0001 R15: 0003 >__do_sys_epoll_pwait fs/eventpoll.c:2364 [inline] >__se_sys_epoll_pwait fs/eventpoll.c:2350 [inline] >__x64_sys_epoll_pwait+0x92/0x150 fs/eventpoll.c:2350 >do_syscall_64+0x6a/0xe0 arch/x86/entry/common.c:384 >entry_SYSCALL_64_after_hwframe+0x44/0xa9 >RIP: 0033:0x469240 >Code: 0f 05 89 44 24 20 c3 cc cc cc 8b 7c 24 08 48 8b 74 24 10 8b 54 24 18 44 >8b 54 24 1c >49 c7 c0 00 00 00 00 b8 19 01 00 00 0f 05 <89> 44 24 20 c3 cc cc >cc cc cc cc cc cc cc cc cc >8b 7c 24 08 48 c7 >RSP: 002b:00c4b7f0 EFLAGS: 0246 ORIG_RAX: 0119 >RAX: ffda RBX: 0001 RCX: 00469240 >RDX: 0080 RSI: 00c4b840 RDI: 0003 >RBP: 00c4be40 R08: R09: >R10: 0001 R11: 0246 R12: 0003 >R13: 00c9cc00 R14: 00c00032c180 R15: >Kernel Offset: disabled >Rebooting in 86400 seconds.. In "ep_poll" func the lockdep_assert_irqs_enabled detected interrupt status, although before enter "ep_poll" func, irq is already enabled, but it was missed lockdep irq status set. --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -76,6 +76,7 @@ noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall) instrumentation_begin(); local_irq_enable(); + lockdep_hardirqs_on(CALLER_ADDR0); ti_work = READ_ONCE(current_thread_info()->flags); if (ti_work & SYSCALL_ENTER_WORK) syscall = syscall_trace_enter(regs, syscall, ti_work); -- Can we should add lockdep_hardirqs_on? >--- >This report is generated by a bot. It may contain errors. >See https://goo.gl/tpsmEJ for more information about syzbot. >syzbot engineers can be reached at syzkal...@googlegroups.com. >syzbot will keep track of this issue. See: >https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
回复: 回复: [PATCH] ALSA: seq: KASAN: use-after-free Read in delete_and_unsubscribe_port
发件人: Takashi Iwai 发送时间: 2020年8月3日 14:16 收件人: Zhang, Qiang 抄送: pe...@perex.cz; ti...@suse.com; alsa-de...@alsa-project.org; linux-kernel@vger.kernel.org 主题: Re: 回复: [PATCH] ALSA: seq: KASAN: use-after-free Read in delete_and_unsubscribe_port On Mon, 03 Aug 2020 03:35:05 +0200, Zhang, Qiang wrote: > > >Thanks for the patch. But I'm afraid that this change would break the > >existing behavior and might have a bad side-effect. > > >It's likely the same issue as reported in another syzkaller report > >("KASAN: invalid-free in snd_seq_port_disconnect"), and Hillf's patch > >below should covert this as well. Could you check whether it works? > > yes It's should same issue, add mutex lock in odev_ioctl, ensure > serialization. > however, it should not be necessary to mutually exclusive with open and close. >>>That's a big-hammer approach indeed, but it should be more reasonable >>>in this case. It makes the patch shorter and simpler, while the OSS >>>sequencer is an ancient interface that wasn't considered much for the >>>concurrency, and this might also cover the case where the access to >>>another sequencer object that is being to be closed. >>>So, it'd be great if you can confirm that the patch actually works. >>>Then we can brush up and merge it for 5.9-rc1. Just like you said, this change is more reasonable. It makes the patch shorter and simpler. >>>thanks, >>>Takashi > > > > >thanks, > > >Takashi > > >--- > >--- a/sound/core/seq/oss/seq_oss.c > >+++ b/sound/core/seq/oss/seq_oss.c > >@@ -167,11 +167,17 @@ odev_write(struct file *file, const char > >static long > >odev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) > >{ > >+ long rc; > > struct seq_oss_devinfo *dp; > >+ > >+ mutex_lock(_mutex); > > dp = file->private_data; > >if (snd_BUG_ON(!dp)) > >- return -ENXIO; > >- return snd_seq_oss_ioctl(dp, cmd, arg); > >+ rc = -ENXIO; > >+ else > >+ rc = snd_seq_oss_ioctl(dp, cmd, arg); > >+ mutex_unlock(_mutex); > >+ return rc; > >} > > >#ifdef CONFIG_COMPAT
回复: [PATCH] ALSA: seq: KASAN: use-after-free Read in delete_and_unsubscribe_port
发件人: Takashi Iwai 发送时间: 2020年8月1日 17:39 收件人: Zhang, Qiang 抄送: pe...@perex.cz; ti...@suse.com; alsa-de...@alsa-project.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] ALSA: seq: KASAN: use-after-free Read in delete_and_unsubscribe_port On Sat, 01 Aug 2020 08:24:03 +0200, wrote: > > From: Zhang Qiang > > There is a potential race window,when a task acquire "src->list_mutex" > write sem,traverse the linked list to find "subs" objects through > parameter "info" in snd_seq_port_disconnect and then release this > write sem, before this task acquire write sem again,this write sem > may be acquired by another task, and get the same "subs" object through > the same "info" before, it could happen "use-after-free" later, so a > simple solution is to delete the object from the linked list when it > is found. > > BUG: KASAN: use-after-free in list_empty include/linux/list.h:282 [inline] > BUG: KASAN: use-after-free in delete_and_unsubscribe_port+0x8b/0x450 > sound/core/seq/seq_ports.c:530 > Read of size 8 at addr 888098523060 by task syz-executor.0/7202 > > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x1f0/0x31e lib/dump_stack.c:118 > print_address_description+0x66/0x5a0 mm/kasan/report.c:383 > __kasan_report mm/kasan/report.c:513 [inline] > kasan_report+0x132/0x1d0 mm/kasan/report.c:530 > list_empty include/linux/list.h:282 [inline] > delete_and_unsubscribe_port+0x8b/0x450 sound/core/seq/seq_ports.c:530 > snd_seq_port_disconnect+0x568/0x610 sound/core/seq/seq_ports.c:612 > snd_seq_ioctl_unsubscribe_port+0x349/0x6c0 > sound/core/seq/seq_clientmgr.c:1525 > snd_seq_oss_midi_close+0x397/0x620 sound/core/seq/oss/seq_oss_midi.c:405 > snd_seq_oss_synth_reset+0x335/0x8b0 sound/core/seq/oss/seq_oss_synth.c:406 > snd_seq_oss_reset+0x5b/0x250 sound/core/seq/oss/seq_oss_init.c:435 > snd_seq_oss_ioctl+0x5c2/0x1090 sound/core/seq/oss/seq_oss_ioctl.c:93 > odev_ioctl+0x51/0x70 sound/core/seq/oss/seq_oss.c:174 > vfs_ioctl fs/ioctl.c:48 [inline] > ksys_ioctl fs/ioctl.c:753 [inline] > __do_sys_ioctl fs/ioctl.c:762 [inline] > __se_sys_ioctl+0xf9/0x160 fs/ioctl.c:760 > do_syscall_64+0x73/0xe0 arch/x86/entry/common.c:384 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Allocated by task 7202: > save_stack mm/kasan/common.c:48 [inline] > set_track mm/kasan/common.c:56 [inline] > __kasan_kmalloc+0x103/0x140 mm/kasan/common.c:494 > kmem_cache_alloc_trace+0x234/0x300 mm/slab.c:3551 > kmalloc include/linux/slab.h:555 [inline] > kzalloc include/linux/slab.h:669 [inline] > snd_seq_port_connect+0x66/0x460 sound/core/seq/seq_ports.c:553 > snd_seq_ioctl_subscribe_port+0x349/0x6c0 sound/core/seq/seq_clientmgr.c:1484 > snd_seq_oss_midi_open+0x4db/0x830 sound/core/seq/oss/seq_oss_midi.c:364 > snd_seq_oss_synth_setup_midi+0x108/0x510 > sound/core/seq/oss/seq_oss_synth.c:269 > snd_seq_oss_open+0x899/0xe90 sound/core/seq/oss/seq_oss_init.c:261 > odev_open+0x5e/0x90 sound/core/seq/oss/seq_oss.c:125 > chrdev_open+0x498/0x580 fs/char_dev.c:414 > do_dentry_open+0x813/0x1070 fs/open.c:828 > do_open fs/namei.c:3243 [inline] > path_openat+0x278d/0x37f0 fs/namei.c:3360 > do_filp_open+0x191/0x3a0 fs/namei.c:3387 > do_sys_openat2+0x463/0x770 fs/open.c:1179 > do_sys_open fs/open.c:1195 [inline] > __do_sys_openat fs/open.c:1209 [inline] > __se_sys_openat fs/open.c:1204 [inline] > __x64_sys_openat+0x1c8/0x1f0 fs/open.c:1204 > do_syscall_64+0x73/0xe0 arch/x86/entry/common.c:384 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Freed by task 7203: > save_stack mm/kasan/common.c:48 [inline] > set_track mm/kasan/common.c:56 [inline] > kasan_set_free_info mm/kasan/common.c:316 [inline] > __kasan_slab_free+0x114/0x170 mm/kasan/common.c:455 > __cache_free mm/slab.c:3426 [inline] > kfree+0x10a/0x220 mm/slab.c:3757 > snd_seq_port_disconnect+0x570/0x610 sound/core/seq/seq_ports.c:614 > snd_seq_ioctl_unsubscribe_port+0x349/0x6c0 > sound/core/seq/seq_clientmgr.c:1525 > snd_seq_oss_midi_close+0x397/0x620 sound/core/seq/oss/seq_oss_midi.c:405 > snd_seq_oss_synth_reset+0x335/0x8b0 sound/core/seq/oss/seq_oss_synth.c:406 > snd_seq_oss_reset+0x5b/0x250 sound/core/seq/oss/seq_oss_init.c:435 > snd_seq_oss_ioctl+0x5c2/0x1090 sound/core/seq/oss/seq_oss_ioctl.c:93 > odev_ioctl+0x51/0x70 sound/core/seq/oss/seq_oss.c:174 > vfs_ioctl fs/ioctl.c:48 [inline] > ksys_ioctl fs/ioctl.c:753 [inline] > __do_sys_ioctl fs/ioctl.c:762 [inline] > __se_sys_ioctl+0xf9/0x160 fs/ioctl.c:760 > do_syscall_64+0x73/0xe0 arch/x86/entry/common.c:384 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > The buggy address belongs to the object at fff
回复: [PATCH v3] mm/slab.c: add node spinlock protect in __cache_free_alien
发件人: Zhang, Qiang 发送时间: 2020年7月31日 9:27 收件人: David Rientjes 抄送: c...@linux.com; penb...@kernel.org; iamjoonsoo@lge.com; a...@linux-foundation.org; linux...@kvack.org; linux-kernel@vger.kernel.org 主题: 回复: [PATCH v3] mm/slab.c: add node spinlock protect in __cache_free_alien 发件人: David Rientjes 发送时间: 2020年7月31日 7:45 收件人: Zhang, Qiang 抄送: c...@linux.com; penb...@kernel.org; iamjoonsoo@lge.com; a...@linux-foundation.org; linux...@kvack.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH v3] mm/slab.c: add node spinlock protect in __cache_free_alien On Thu, 30 Jul 2020, qiang.zh...@windriver.com wrote: > From: Zhang Qiang > > for example: > node0 > cpu0cpu1 > slab_dead_cpu >>mutex_lock(_mutex) > >cpuup_canceledslab_dead_cpu >>mask = cpumask_of_node(node) >mutex_lock(_mutex) >>n = get_node(cachep0, node0) >>spin_lock_irq(n&->list_lock) >>if (!cpumask_empty(mask)) == true > >spin_unlock_irq(>list_lock) > >goto free_slab > >>mutex_unlock(_mutex) > > >cpuup_canceled >>mask = > cpumask_of_node(node) > kmem_cache_free(cachep0 ) >n = get_node(cachep0, > node0) > >__cache_free_alien(cachep0 ) > >spin_lock_irq(n&->list_lock) >>n = get_node(cachep0, node0) >if (!cpumask_empty(mask)) > == false >>if (n->alien && n->alien[page_node]) >alien = n->alien > >alien = n->alien[page_node] >n->alien = NULL > > > >spin_unlock_irq(>list_lock) >> > >As mentioned in the review of v1 of this patch, we likely want to do a fix >for cpuup_canceled() instead. >I see, you mean do fix in "cpuup_canceled" func? I'm very sorry, due to cpu_down receive gobal "cpu_hotplug_lock" write lock protect. multiple cpu offline is serial,the scenario I described above does not exist.
回复: [PATCH v3] mm/slab.c: add node spinlock protect in __cache_free_alien
发件人: David Rientjes 发送时间: 2020年7月31日 7:45 收件人: Zhang, Qiang 抄送: c...@linux.com; penb...@kernel.org; iamjoonsoo@lge.com; a...@linux-foundation.org; linux...@kvack.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH v3] mm/slab.c: add node spinlock protect in __cache_free_alien On Thu, 30 Jul 2020, qiang.zh...@windriver.com wrote: > From: Zhang Qiang > > for example: > node0 > cpu0cpu1 > slab_dead_cpu >>mutex_lock(_mutex) > >cpuup_canceledslab_dead_cpu >>mask = cpumask_of_node(node) >mutex_lock(_mutex) >>n = get_node(cachep0, node0) >>spin_lock_irq(n&->list_lock) >>if (!cpumask_empty(mask)) == true > >spin_unlock_irq(>list_lock) > >goto free_slab > >>mutex_unlock(_mutex) > > >cpuup_canceled >>mask = > cpumask_of_node(node) > kmem_cache_free(cachep0 ) >n = get_node(cachep0, > node0) > >__cache_free_alien(cachep0 ) > >spin_lock_irq(n&->list_lock) >>n = get_node(cachep0, node0) >if (!cpumask_empty(mask)) > == false >>if (n->alien && n->alien[page_node]) >alien = n->alien > >alien = n->alien[page_node] >n->alien = NULL > > > >spin_unlock_irq(>list_lock) >> > >As mentioned in the review of v1 of this patch, we likely want to do a fix >for cpuup_canceled() instead. I see, you mean do fix in "cpuup_canceled" func?
回复: KASAN: use-after-free Read in delete_and_unsubscribe_port (2)
发件人: linux-kernel-ow...@vger.kernel.org 代表 syzbot 发送时间: 2020年7月30日 11:33 收件人: alsa-devel-ow...@alsa-project.org; alsa-de...@alsa-project.org; linux-kernel@vger.kernel.org; pe...@perex.cz; syzkaller-b...@googlegroups.com; ti...@suse.com 主题: Re: KASAN: use-after-free Read in delete_and_unsubscribe_port (2) syzbot has found a reproducer for the following issue on: HEAD commit:d3590ebf Merge tag 'audit-pr-20200729' of git://git.kernel.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1207e0b890 kernel config: https://syzkaller.appspot.com/x/.config?x=812bbfcb6ae2cd60 dashboard link: https://syzkaller.appspot.com/bug?extid=1a54a94bd32716796edd compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11b227f890 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+1a54a94bd32716796...@syzkaller.appspotmail.com == BUG: KASAN: use-after-free in list_empty include/linux/list.h:282 [inline] BUG: KASAN: use-after-free in delete_and_unsubscribe_port+0x8b/0x450 sound/core/seq/seq_ports.c:530 Read of size 8 at addr 888098523060 by task syz-executor.0/7202 CPU: 1 PID: 7202 Comm: syz-executor.0 Not tainted 5.8.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1f0/0x31e lib/dump_stack.c:118 print_address_description+0x66/0x5a0 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report+0x132/0x1d0 mm/kasan/report.c:530 list_empty include/linux/list.h:282 [inline] It 's looks likely "subs->ref_count" problem. delete_and_unsubscribe_port+0x8b/0x450 sound/core/seq/seq_ports.c:530 snd_seq_port_disconnect+0x568/0x610 sound/core/seq/seq_ports.c:612 snd_seq_ioctl_unsubscribe_port+0x349/0x6c0 sound/core/seq/seq_clientmgr.c:1525 snd_seq_oss_midi_close+0x397/0x620 sound/core/seq/oss/seq_oss_midi.c:405 snd_seq_oss_synth_reset+0x335/0x8b0 sound/core/seq/oss/seq_oss_synth.c:406 snd_seq_oss_reset+0x5b/0x250 sound/core/seq/oss/seq_oss_init.c:435 snd_seq_oss_ioctl+0x5c2/0x1090 sound/core/seq/oss/seq_oss_ioctl.c:93 odev_ioctl+0x51/0x70 sound/core/seq/oss/seq_oss.c:174 vfs_ioctl fs/ioctl.c:48 [inline] ksys_ioctl fs/ioctl.c:753 [inline] __do_sys_ioctl fs/ioctl.c:762 [inline] __se_sys_ioctl+0xf9/0x160 fs/ioctl.c:760 do_syscall_64+0x73/0xe0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45c429 Code: 8d b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 5b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7f6e48930c78 EFLAGS: 0246 ORIG_RAX: 0010 RAX: ffda RBX: 000154c0 RCX: 0045c429 RDX: RSI: 5100 RDI: 0003 RBP: 0078bf38 R08: R09: R10: R11: 0246 R12: 0078bf0c R13: 7ffe51b9d10f R14: 7f6e489319c0 R15: 0078bf0c Allocated by task 7202: save_stack mm/kasan/common.c:48 [inline] set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0x103/0x140 mm/kasan/common.c:494 kmem_cache_alloc_trace+0x234/0x300 mm/slab.c:3551 kmalloc include/linux/slab.h:555 [inline] kzalloc include/linux/slab.h:669 [inline] snd_seq_port_connect+0x66/0x460 sound/core/seq/seq_ports.c:553 snd_seq_ioctl_subscribe_port+0x349/0x6c0 sound/core/seq/seq_clientmgr.c:1484 snd_seq_oss_midi_open+0x4db/0x830 sound/core/seq/oss/seq_oss_midi.c:364 snd_seq_oss_synth_setup_midi+0x108/0x510 sound/core/seq/oss/seq_oss_synth.c:269 snd_seq_oss_open+0x899/0xe90 sound/core/seq/oss/seq_oss_init.c:261 odev_open+0x5e/0x90 sound/core/seq/oss/seq_oss.c:125 chrdev_open+0x498/0x580 fs/char_dev.c:414 do_dentry_open+0x813/0x1070 fs/open.c:828 do_open fs/namei.c:3243 [inline] path_openat+0x278d/0x37f0 fs/namei.c:3360 do_filp_open+0x191/0x3a0 fs/namei.c:3387 do_sys_openat2+0x463/0x770 fs/open.c:1179 do_sys_open fs/open.c:1195 [inline] __do_sys_openat fs/open.c:1209 [inline] __se_sys_openat fs/open.c:1204 [inline] __x64_sys_openat+0x1c8/0x1f0 fs/open.c:1204 do_syscall_64+0x73/0xe0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 7203: save_stack mm/kasan/common.c:48 [inline] set_track mm/kasan/common.c:56 [inline] kasan_set_free_info mm/kasan/common.c:316 [inline] __kasan_slab_free+0x114/0x170 mm/kasan/common.c:455 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x220 mm/slab.c:3757 snd_seq_port_disconnect+0x570/0x610 sound/core/seq/seq_ports.c:614 snd_seq_ioctl_unsubscribe_port+0x349/0x6c0 sound/core/seq/seq_clientmgr.c:1525
回复: INFO: rcu detected stall in tc_modify_qdisc
发件人: linux-kernel-ow...@vger.kernel.org 代表 syzbot 发送时间: 2020年7月29日 13:53 收件人: da...@davemloft.net; fweis...@gmail.com; j...@mojatatu.com; j...@resnulli.us; linux-kernel@vger.kernel.org; mi...@kernel.org; net...@vger.kernel.org; syzkaller-b...@googlegroups.com; t...@linutronix.de; vinicius.go...@intel.com; xiyou.wangc...@gmail.com 主题: INFO: rcu detected stall in tc_modify_qdisc Hello, syzbot found the following issue on: HEAD commit:181964e6 fix a braino in cmsghdr_from_user_compat_to_kern() git tree: net console output: https://syzkaller.appspot.com/x/log.txt?x=12925e3890 kernel config: https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267 dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce compiler: gcc (GCC) 10.1.0-syz 20200507 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16587f8c90 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15b2d79090 The issue was bisected to: commit 5a781ccbd19e4664babcbe4b4ead7aa2b9283d22 Author: Vinicius Costa Gomes Date: Sat Sep 29 00:59:43 2018 + tc: Add support for configuring the taprio scheduler bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=160e1bac90 console output: https://syzkaller.appspot.com/x/log.txt?x=110e1bac90 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+9f78d5c664a8c33f4...@syzkaller.appspotmail.com Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") rcu: INFO: rcu_preempt self-detected stall on CPU rcu:1-...!: (1 GPs behind) idle=6f6/1/0x4000 softirq=10195/10196 fqs=1 (t=27930 jiffies g=9233 q=413) rcu: rcu_preempt kthread starved for 27901 jiffies! g9233 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 rcu:Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: rcu_preempt R running task2911210 2 0x4000 Call Trace: context_switch kernel/sched/core.c:3458 [inline] __schedule+0x8ea/0x2210 kernel/sched/core.c:4219 schedule+0xd0/0x2a0 kernel/sched/core.c:4294 schedule_timeout+0x148/0x250 kernel/time/timer.c:1908 rcu_gp_fqs_loop kernel/rcu/tree.c:1874 [inline] rcu_gp_kthread+0xae5/0x1b50 kernel/rcu/tree.c:2044 kthread+0x3b5/0x4a0 kernel/kthread.c:291 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293 NMI backtrace for cpu 1 CPU: 1 PID: 6799 Comm: syz-executor494 Not tainted 5.8.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x18f/0x20d lib/dump_stack.c:118 nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101 nmi_trigger_cpumask_backtrace+0x1b3/0x223 lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x194/0x1cf kernel/rcu/tree_stall.h:320 print_cpu_stall kernel/rcu/tree_stall.h:553 [inline] check_cpu_stall kernel/rcu/tree_stall.h:627 [inline] rcu_pending kernel/rcu/tree.c:3489 [inline] rcu_sched_clock_irq.cold+0x5b3/0xccc kernel/rcu/tree.c:2504 update_process_times+0x25/0x60 kernel/time/timer.c:1737 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176 tick_sched_timer+0x108/0x290 kernel/time/tick-sched.c:1320 __run_hrtimer kernel/time/hrtimer.c:1520 [inline] __hrtimer_run_queues+0x1d5/0xfc0 kernel/time/hrtimer.c:1584 hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline] __sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:585 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline] RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191 Code: 48 c7 c0 88 e0 b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d e3 52 cc 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 35 e5 66 f9 65 8b 05 fe 70 19 78 RSP: 0018:c900016672c0 EFLAGS: 0282 RAX: 11369c11 RBX: 0282 RCX: 0002 RDX: dc00 RSI: RDI: 0282 RBP: 888093a052e8 R08: R09: R10: 0001 R11: R12: 0282 R13: 0078100c35c3 R14: 888093a05000 R15: spin_unlock_irqrestore include/linux/spinlock.h:408 [inline] taprio_change+0x1fdc/0x2960 net/sched/sch_taprio.c:1557 It looks
回复: [PATCH] mm/slab.c: add node spinlock protect in __cache_free_alien
发件人: David Rientjes 发送时间: 2020年7月29日 3:46 收件人: Zhang, Qiang 抄送: c...@linux.com; penb...@kernel.org; iamjoonsoo@lge.com; a...@linux-foundation.org; linux...@kvack.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] mm/slab.c: add node spinlock protect in __cache_free_alien On Tue, 28 Jul 2020, qiang.zh...@windriver.com wrote: > From: Zhang Qiang > > We should add node spinlock protect "n->alien" which may be > assigned to NULL in cpuup_canceled func. cause address access > exception. > >Hi, do you have an example NULL pointer dereference where you have hit >this? >This rather looks like something to fix up in cpuup_canceled() since it's >currently manipulating the alien cache for the canceled cpu's node. yes , it is fix up in cpuup_canceled it's currently manipulating the alien cache for the canceled cpu's node which may be the same as the node being operated on in the __cache_free_alien func. void cpuup_canceled { n = get_node(cachep, node); spin_lock_irq(>list_lock); ... n->alien = NULL; spin_unlock_irq(>list_lock); } > Fixes: 18bf854117c6 ("slab: use get_node() and kmem_cache_node() functions") > Signed-off-by: Zhang Qiang > --- > mm/slab.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/mm/slab.c b/mm/slab.c > index a89633603b2d..290523c90b4e 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -759,8 +759,10 @@ static int __cache_free_alien(struct kmem_cache *cachep, > void *objp, > > n = get_node(cachep, node); > STATS_INC_NODEFREES(cachep); > + spin_lock(>list_lock); > if (n->alien && n->alien[page_node]) { > alien = n->alien[page_node]; > + spin_unlock(>list_lock); > ac = >ac; > spin_lock(>lock); > if (unlikely(ac->avail == ac->limit)) { > @@ -769,14 +771,15 @@ static int __cache_free_alien(struct kmem_cache > *cachep, void *objp, > } > ac->entry[ac->avail++] = objp; > spin_unlock(>lock); > - slabs_destroy(cachep, ); > } else { > + spin_unlock(>list_lock); > n = get_node(cachep, page_node); > spin_lock(>list_lock); > free_block(cachep, , 1, page_node, ); > spin_unlock(>list_lock); > - slabs_destroy(cachep, ); > } > + > + slabs_destroy(cachep, ); > return 1; > } > > -- > 2.26.2 > >
回复: [PATCH v2] tipc: Don't using smp_processor_id() in preemptible code
发件人: Zhang, Qiang 发送时间: 2020年7月15日 13:27 收件人: Tuong Tong Lien; Eric Dumazet; jma...@redhat.com; da...@davemloft.net; k...@kernel.org; Xue, Ying 抄送: net...@vger.kernel.org; tipc-discuss...@lists.sourceforge.net; linux-kernel@vger.kernel.org 主题: 回复: [PATCH v2] tipc: Don't using smp_processor_id() in preemptible code 发件人: Tuong Tong Lien 发送时间: 2020年7月15日 11:53 收件人: Zhang, Qiang; Eric Dumazet; jma...@redhat.com; da...@davemloft.net; k...@kernel.org; Xue, Ying 抄送: net...@vger.kernel.org; tipc-discuss...@lists.sourceforge.net; linux-kernel@vger.kernel.org 主题: RE: [PATCH v2] tipc: Don't using smp_processor_id() in preemptible code > -Original Message- > From: Zhang, Qiang > Sent: Wednesday, July 15, 2020 9:13 AM > To: Eric Dumazet ; jma...@redhat.com; > da...@davemloft.net; k...@kernel.org; Tuong Tong Lien > ; Xue, Ying > Cc: net...@vger.kernel.org; tipc-discuss...@lists.sourceforge.net; > linux-kernel@vger.kernel.org > Subject: 回复: [PATCH v2] tipc: Don't using smp_processor_id() in preemptible > code > > > > > 发件人: Eric Dumazet > 发送时间: 2020年7月14日 22:15 > 收件人: Zhang, Qiang; jma...@redhat.com; da...@davemloft.net; k...@kernel.org; > tuong.t.l...@dektech.com.au; > eric.duma...@gmail.com; Xue, Ying > 抄送: net...@vger.kernel.org; tipc-discuss...@lists.sourceforge.net; > linux-kernel@vger.kernel.org > 主题: Re: [PATCH v2] tipc: Don't using smp_processor_id() in preemptible code > > > > On 7/14/20 1:05 AM, qiang.zh...@windriver.com wrote: > > From: Zhang Qiang > > > > CPU: 0 PID: 6801 Comm: syz-executor201 Not tainted 5.8.0-rc4-syzkaller #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, > > BIOS Google 01/01/2011 > > > > Fixes: fc1b6d6de2208 ("tipc: introduce TIPC encryption & authentication") > > Reported-by: syzbot+263f8c0d007dc09b2...@syzkaller.appspotmail.com > > Signed-off-by: Zhang Qiang > > --- > > v1->v2: > > add fixes tags. > > > > net/tipc/crypto.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c > > index 8c47ded2edb6..520af0afe1b3 100644 > > --- a/net/tipc/crypto.c > > +++ b/net/tipc/crypto.c > > @@ -399,9 +399,10 @@ static void tipc_aead_users_set(struct tipc_aead __rcu > > *aead, int val) > > */ > > static struct crypto_aead *tipc_aead_tfm_next(struct tipc_aead *aead) > > { > > - struct tipc_tfm **tfm_entry = this_cpu_ptr(aead->tfm_entry); > > + struct tipc_tfm **tfm_entry = get_cpu_ptr(aead->tfm_entry); > > > > *tfm_entry = list_next_entry(*tfm_entry, list); > > + put_cpu_ptr(tfm_entry); > > return (*tfm_entry)->tfm; > > } > > > > > > > You have not explained why this was safe. > > > > This seems to hide a real bug. > > > > Presumably callers of this function should have disable preemption, and > > maybe > interrupts as well. > > > >Right after put_cpu_ptr(tfm_entry), this thread could migrate to another > >cpu, >and still access > >data owned by the old cpu. > > Thanks for you suggest, I will check code again. > >Actually, last week I sent a similar patch to tipc-discussion which covers the >case as well (there is also another place causing the same issue...). If you >don't mind, you can take a look at below (just copied/pasted). >BR/Tuong >Hi Tuong Tong Lien >The tipc_aead_free is RCU callback, this func is called in softirq context >which >preemption has been banned >so should not add preempt_disable/enable. >thanks >Zhang Qiang sorry there are some questions in my reply. the tipc_aead_free function may also be called in the thread context. if enable CONFIG_RCU_BOOST >-Original Message- >From: Tuong Tong Lien >Sent: Friday, July 10, 2020 5:11 PM >To: jma...@redhat.com; ma...@donjonn.com; ying@windriver.com; >tipc-discuss...@lists.sourceforge.net >Cc: tipc-dek Subject: [PATCH RFC 1/5] tipc: fix using smp_processor_id() in preemptible > >The 'this_cpu_ptr()' is used to obtain the AEAD key' TFM on the current CPU for encryption, however the execution can be preemptible since it's actually user-space context, so the 'using smp_processor_id() in preemptible' has been observed. We fix the issue by using the 'get/put_cpu_ptr()' API which consists of a 'preempt_disable()' instead. Signed-off-by: Tuong Lien --- net/tipc/crypto.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c index c8c47fc72653..
回复: [PATCH v2] tipc: Don't using smp_processor_id() in preemptible code
发件人: Tuong Tong Lien 发送时间: 2020年7月15日 11:53 收件人: Zhang, Qiang; Eric Dumazet; jma...@redhat.com; da...@davemloft.net; k...@kernel.org; Xue, Ying 抄送: net...@vger.kernel.org; tipc-discuss...@lists.sourceforge.net; linux-kernel@vger.kernel.org 主题: RE: [PATCH v2] tipc: Don't using smp_processor_id() in preemptible code > -Original Message- > From: Zhang, Qiang > Sent: Wednesday, July 15, 2020 9:13 AM > To: Eric Dumazet ; jma...@redhat.com; > da...@davemloft.net; k...@kernel.org; Tuong Tong Lien > ; Xue, Ying > Cc: net...@vger.kernel.org; tipc-discuss...@lists.sourceforge.net; > linux-kernel@vger.kernel.org > Subject: 回复: [PATCH v2] tipc: Don't using smp_processor_id() in preemptible > code > > > > > 发件人: Eric Dumazet > 发送时间: 2020年7月14日 22:15 > 收件人: Zhang, Qiang; jma...@redhat.com; da...@davemloft.net; k...@kernel.org; > tuong.t.l...@dektech.com.au; > eric.duma...@gmail.com; Xue, Ying > 抄送: net...@vger.kernel.org; tipc-discuss...@lists.sourceforge.net; > linux-kernel@vger.kernel.org > 主题: Re: [PATCH v2] tipc: Don't using smp_processor_id() in preemptible code > > > > On 7/14/20 1:05 AM, qiang.zh...@windriver.com wrote: > > From: Zhang Qiang > > > > CPU: 0 PID: 6801 Comm: syz-executor201 Not tainted 5.8.0-rc4-syzkaller #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, > > BIOS Google 01/01/2011 > > > > Fixes: fc1b6d6de2208 ("tipc: introduce TIPC encryption & authentication") > > Reported-by: syzbot+263f8c0d007dc09b2...@syzkaller.appspotmail.com > > Signed-off-by: Zhang Qiang > > --- > > v1->v2: > > add fixes tags. > > > > net/tipc/crypto.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c > > index 8c47ded2edb6..520af0afe1b3 100644 > > --- a/net/tipc/crypto.c > > +++ b/net/tipc/crypto.c > > @@ -399,9 +399,10 @@ static void tipc_aead_users_set(struct tipc_aead __rcu > > *aead, int val) > > */ > > static struct crypto_aead *tipc_aead_tfm_next(struct tipc_aead *aead) > > { > > - struct tipc_tfm **tfm_entry = this_cpu_ptr(aead->tfm_entry); > > + struct tipc_tfm **tfm_entry = get_cpu_ptr(aead->tfm_entry); > > > > *tfm_entry = list_next_entry(*tfm_entry, list); > > + put_cpu_ptr(tfm_entry); > > return (*tfm_entry)->tfm; > > } > > > > > > > You have not explained why this was safe. > > > > This seems to hide a real bug. > > > > Presumably callers of this function should have disable preemption, and > > maybe > interrupts as well. > > > >Right after put_cpu_ptr(tfm_entry), this thread could migrate to another > >cpu, >and still access > >data owned by the old cpu. > > Thanks for you suggest, I will check code again. > >Actually, last week I sent a similar patch to tipc-discussion which covers the >case as well (there is also another place causing the same issue...). If you >don't mind, you can take a look at below (just copied/pasted). >BR/Tuong Hi Tuong Tong Lien The tipc_aead_free is RCU callback, this func is called in softirq context which preemption has been banned so should not add preempt_disable/enable. thanks Zhang Qiang >-Original Message- >From: Tuong Tong Lien >Sent: Friday, July 10, 2020 5:11 PM >To: jma...@redhat.com; ma...@donjonn.com; ying@windriver.com; >tipc-discuss...@lists.sourceforge.net >Cc: tipc-dek Subject: [PATCH RFC 1/5] tipc: fix using smp_processor_id() in preemptible > >The 'this_cpu_ptr()' is used to obtain the AEAD key' TFM on the current CPU for encryption, however the execution can be preemptible since it's actually user-space context, so the 'using smp_processor_id() in preemptible' has been observed. We fix the issue by using the 'get/put_cpu_ptr()' API which consists of a 'preempt_disable()' instead. Signed-off-by: Tuong Lien --- net/tipc/crypto.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c index c8c47fc72653..1827ce4fac5d 100644 --- a/net/tipc/crypto.c +++ b/net/tipc/crypto.c @@ -326,7 +326,8 @@ static void tipc_aead_free(struct rcu_head *rp) if (aead->cloned) { tipc_aead_put(aead->cloned); } else { - head = *this_cpu_ptr(aead->tfm_entry); + head = *get_cpu_ptr(aead->tfm_entry); + put_cpu_ptr(aead->tfm_entry); list_for_each_entry_safe(tfm_entry, tmp, >list, list) { crypto_free_aead(tfm_entry->tfm);
回复: [PATCH v2] tipc: Don't using smp_processor_id() in preemptible code
发件人: Eric Dumazet 发送时间: 2020年7月14日 22:15 收件人: Zhang, Qiang; jma...@redhat.com; da...@davemloft.net; k...@kernel.org; tuong.t.l...@dektech.com.au; eric.duma...@gmail.com; Xue, Ying 抄送: net...@vger.kernel.org; tipc-discuss...@lists.sourceforge.net; linux-kernel@vger.kernel.org 主题: Re: [PATCH v2] tipc: Don't using smp_processor_id() in preemptible code On 7/14/20 1:05 AM, qiang.zh...@windriver.com wrote: > From: Zhang Qiang > > CPU: 0 PID: 6801 Comm: syz-executor201 Not tainted 5.8.0-rc4-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, > BIOS Google 01/01/2011 > > Fixes: fc1b6d6de2208 ("tipc: introduce TIPC encryption & authentication") > Reported-by: syzbot+263f8c0d007dc09b2...@syzkaller.appspotmail.com > Signed-off-by: Zhang Qiang > --- > v1->v2: > add fixes tags. > > net/tipc/crypto.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c > index 8c47ded2edb6..520af0afe1b3 100644 > --- a/net/tipc/crypto.c > +++ b/net/tipc/crypto.c > @@ -399,9 +399,10 @@ static void tipc_aead_users_set(struct tipc_aead __rcu > *aead, int val) > */ > static struct crypto_aead *tipc_aead_tfm_next(struct tipc_aead *aead) > { > - struct tipc_tfm **tfm_entry = this_cpu_ptr(aead->tfm_entry); > + struct tipc_tfm **tfm_entry = get_cpu_ptr(aead->tfm_entry); > > *tfm_entry = list_next_entry(*tfm_entry, list); > + put_cpu_ptr(tfm_entry); > return (*tfm_entry)->tfm; > } > > > You have not explained why this was safe. > > This seems to hide a real bug. > > Presumably callers of this function should have disable preemption, and maybe > > interrupts as well. > >Right after put_cpu_ptr(tfm_entry), this thread could migrate to another cpu, >>and still access >data owned by the old cpu. Thanks for you suggest, I will check code again.
回复: WARNING in submit_audio_out_urb/usb_submit_urb
发件人: linux-kernel-ow...@vger.kernel.org 代表 syzbot 发送时间: 2020年7月9日 21:34 收件人: andreyk...@google.com; gre...@linuxfoundation.org; ingras...@epigenesys.com; linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; syzkaller-b...@googlegroups.com 主题: WARNING in submit_audio_out_urb/usb_submit_urb Hello, syzbot found the following crash on: HEAD commit:768a0741 usb: dwc2: gadget: Remove assigned but never used.. git tree: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing console output: https://syzkaller.appspot.com/x/log.txt?x=1568d11f10 kernel config: https://syzkaller.appspot.com/x/.config?x=999be4eb2478ffa5 dashboard link: https://syzkaller.appspot.com/bug?extid=c190f6858a04ea7fbc52 compiler: gcc (GCC) 10.1.0-syz 20200507 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=123aa2fb10 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+c190f6858a04ea7fb...@syzkaller.appspotmail.com usb 1-1: send failed (error -32) snd_usb_toneport 1-1:0.0: Line 6 TonePort GX now attached [ cut here ] usb 1-1: BOGUS urb xfer, pipe 0 != type 3 WARNING: CPU: 0 PID: 12 at drivers/usb/core/urb.c:478 usb_submit_urb+0xa17/0x13e0 drivers/usb/core/urb.c:478 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.8.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events line6_startup_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xf6/0x16e lib/dump_stack.c:118 panic+0x2aa/0x6e1 kernel/panic.c:231 __warn.cold+0x20/0x50 kernel/panic.c:600 report_bug+0x1bd/0x210 lib/bug.c:198 handle_bug+0x41/0x80 arch/x86/kernel/traps.c:235 exc_invalid_op+0x13/0x40 arch/x86/kernel/traps.c:255 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:563 RIP: 0010:usb_submit_urb+0xa17/0x13e0 drivers/usb/core/urb.c:478 Code: 84 e7 04 00 00 e8 a9 10 ca fd 4c 89 ef e8 41 79 12 ff 41 89 d8 44 89 e1 4c 89 f2 48 89 c6 48 c7 c7 80 a0 5d 86 e8 db 77 9e fd <0f> 0b e8 82 10 ca fd 0f b6 6c 24 08 48 c7 c6 e0 a1 5d 86 48 89 ef RSP: 0018:8881da227b10 EFLAGS: 00010086 RAX: RBX: 0003 RCX: RDX: 8881da211900 RSI: 8129b4e3 RDI: ed103b444f54 RBP: 0030 R08: 0001 R09: 8881db21fe8b R10: R11: 0004 R12: R13: 8881d6ecd0a0 R14: 8881d3d8c690 R15: 8881d54c4000 submit_audio_out_urb+0x6d6/0x1a00 sound/usb/line6/playback.c:271 line6_submit_audio_out_all_urbs+0xc9/0x120 sound/usb/line6/playback.c:291 line6_stream_start+0x187/0x230 sound/usb/line6/pcm.c:195 line6_pcm_acquire+0x137/0x210 sound/usb/line6/pcm.c:318 line6_startup_work+0x42/0x50 sound/usb/line6/driver.c:734 process_one_work+0x94c/0x15f0 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x392/0x470 kernel/kthread.c:291 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293 It's like Alan Stern's reply to the email titled "KASAN: use-after-free Read in line6_submit_audio_in_all_urbs." It's also like a problem with asynchronous operations. can replace "cancel_delayed_work" with" cancel_delayed_work_sync" in "line6_disconnect" func? Zhang Qiang Kernel Offset: disabled Rebooting in 86400 seconds.. --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. syzbot can test patches for this bug, for details see: https://goo.gl/tpsmEJ#testing-patches
回复: 回复: [kthread] a90477f0c9: WARNING:at_kernel/kthread.c:#kthread_queue_work
Thanks for you reply Pter, if the patch was add , like you said there are more work to do. so need remove the patch from -mm tree. Best Regards, Zhang Qiang 发件人: Petr Mladek 发送时间: 2020年7月7日 17:47 收件人: Zhang, Qiang 抄送: Andrew Morton; linux-kernel@vger.kernel.org 主题: Re: 回复: [kthread] a90477f0c9: WARNING:at_kernel/kthread.c:#kthread_queue_work On Mon 2020-07-06 10:17:31, Zhang, Qiang wrote: > Hi , Petr Mladek > There some question for "Work could not be queued when worker being > destroyed" patch, > > when in "spi_init_queue" func : > "kthread_init_worker(>kworker); (worker->task = NULL) > ctlr->kworker_task = kthread_run(kthread_worker_fn, >kworker, > "%s", dev_name(>dev)); " I see. I have missed that there are some kthread_worker users that start the worker this way. They rely on the fact that worker->task is set also by kthread_worker_fn. The proper solution is to start the worker using either kthread_create_worker() or kthread_create_worker_on_cpu(). They set worker->task immediately. It means that more work is needed: 1. Convert all users that start the kthread_worker via kthread_worker_fn to use either kthread_create_worker() or kthread_create_worker_on_cpu(). 2. Remove kthread_worker_fn declaration from include/linux/kthread.h to prevent starting the worker the temporary way. In the same patch, also the assignment to worker->task and the FIXME might get removed from kthread_worker_fn(). 3. Finally, it should be safe to add the WARN_ON() into queuing_blocked(). Best Regards, Petr > > in "spi_start_queue" func: > "kthread_queue_work(>kworker, >pump_messages);" > > Becasue the kthread_worker_fn is not begin running, if queue work to worker, > the "!worker->task" = true, trigger WARN. > > Are Need to add judgment " test_bit(KTHREAD_SHOULD_STOP, > _kthread(current)->flags) && WARN_ON(!worker->task)" in queuing_blocked > func ? > > Zhang Qiang > > > > > > 发件人: kernel test robot > 发送时间: 2020年7月6日 17:38 > 收件人: Zhang, Qiang > 抄送: l...@lists.01.org > 主题: [kthread] a90477f0c9: WARNING:at_kernel/kthread.c:#kthread_queue_work > > Greeting, > > FYI, we noticed the following commit (built with gcc-7): > > commit: a90477f0c956621eb0dd69f0abfb6066ad8fbef7 ("kthread: work could not be > queued when worker being destroyed") > https://github.com/hnaz/linux-mm master > > in testcase: trinity > with following parameters: > > runtime: 300s > > test-description: Trinity is a linux system call fuzz tester. > test-url: http://codemonkey.org.uk/projects/trinity/ > > > on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 16G > > caused below changes (please refer to attached dmesg/kmsg for entire > log/backtrace): > > > +-+++ > | | 85c6127e06 | a90477f0c9 | > +-+++ > | boot_successes | 6 | 0 | > | boot_failures | 0 | 30 | > | WARNING:at_kernel/kthread.c:#kthread_queue_work | 0 | 30 | > | EIP:kthread_queue_work | 0 | 30 | > | BUG:kernel_hang_in_test_stage | 0 | 2 | > +-+++ > > > If you fix the issue, kindly add following tag > Reported-by: kernel test robot > > > [5.554282] WARNING: CPU: 0 PID: 1 at kernel/kthread.c:817 > kthread_queue_work+0xf8/0x120 > [5.556204] Modules linked in: > [5.556204] CPU: 0 PID: 1 Comm: swapper Tainted: G S > 5.8.0-rc3-00014-ga90477f0c9566 #1 > [5.556204] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.12.0-1 04/01/2014 > [5.556204] EIP: kthread_queue_work+0xf8/0x120 > [5.556204] Code: 00 59 e9 67 ff ff ff 8d 76 00 8b 4e 10 85 c9 75 a6 8d 4b > 28 89 f2 89 d8 bf 01 00 00 00 e8 f0 f5 ff ff eb 93 8d b6 00 00 00 00 <0f> 0b > 6a 00 31 c9 ba 01 00 00 00 b8 08 6c 64 c3 e8 f3 04 0b 00 5b > [5.556204] EAX: EBX: eeff538c ECX: EDX: 0001 > [5.556204] ESI: eeff53d0 EDI: EBP: f5edde70 ESP: f5edde5c > [5.556204] DS: 007b ES: 007b FS: GS: SS: 0068 EFLAGS: 00010046 > [5.556204] CR0: 80050033 CR2: b7eda844 CR3: 038ae000 CR4: 000406d0 > [5.556204] Ca
回复: [PATCH v4] kthread: Work could not be queued when worker being destroyed
Sorry, Pter Mladek, this is my mistake please ignore this change. 发件人: linux-kernel-ow...@vger.kernel.org 代表 Petr Mladek 发送时间: 2020年7月7日 17:06 收件人: Zhang, Qiang 抄送: ben.do...@codethink.co.uk; bfie...@redhat.com; c...@rock-chips.com; pet...@infradead.org; t...@kernel.org; a...@linux-foundation.org; naresh.kamb...@linaro.org; mm-comm...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH v4] kthread: Work could not be queued when worker being destroyed On Mon 2020-07-06 13:46:47, qiang.zh...@windriver.com wrote: > From: Zhang Qiang > > Before the work is put into the queue of the worker thread, > the state of the worker thread needs to be detected,because > the worker thread may be in the destruction state at this time. > > Signed-off-by: Zhang Qiang > Suggested-by: Petr Mladek > Reviewed-by: Petr Mladek This patch is completely different from the one that I suggested or acked. Please, never keep acks when doing major rework and people did not agree with it. For this patch: Nacked-by: Petr Mladek See below why. > --- > v1->v2: > Add warning information for condition "!worker->task". > v2->v3: > Modify submission information and add "Reviewed-by" tags. > v3->v4: > Fix spi controller register trigger Warning. > when a spi controller register, a "kthread_worker_fn" worker is created > through "kthread_run" instead of "kthread_create_worker" which in this > func the "worker->task" will be initialized. and then the "spi_start_queue" > func queue a work to worker queue, at this time, if the worker has not > begin to running, the "!worker->task" will be true, so a warning is > triggered. > > kernel/kthread.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/kernel/kthread.c b/kernel/kthread.c > index bfbfa481be3a..825bd4dcdb95 100644 > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -791,6 +791,11 @@ static inline bool queuing_blocked(struct kthread_worker > *worker, > { > lockdep_assert_held(>lock); > > + if (kthread_should_stop()) { This does not make much sense. kthread_should_stop() checks a flag set for the "current" process. It works only when called from inside the kthread worker thread. queuing_blocked() is called from kthread_queue_work() or kthread_queue_delayed_work(). These are typically called from another process. The only exception is when they get re-queued from inside the work. Best Regards, Petr > + WARN_ON(1); > + return true; > + } > + > return !list_empty(>node) || work->canceling; > } > > -- > 2.24.1
回复: [PATCH v3] usb: gadget: function: fix missing spinlock in f_uac1_legacy
Hi Greg KH In the early submission: "commit id c6994e6f067cf0fc4c6cca3d164018b1150916f8" which add USB Audio Gadget driver " the "audio->play_queue" was protected from "audio->lock" spinlock in "playback_work" func, But in "f_audio_out_ep_complete" func there is no protection for the operation of this "audio->play_queue". there are missing spinlock, Fix tags should add up here commit? ________ 发件人: Greg KH 发送时间: 2020年7月7日 3:55 收件人: Zhang, Qiang 抄送: ba...@kernel.org; colin.k...@canonical.com; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; sta...@vger.kernel.org 主题: Re: [PATCH v3] usb: gadget: function: fix missing spinlock in f_uac1_legacy On Sun, Jul 05, 2020 at 08:40:27PM +0800, qiang.zh...@windriver.com wrote: > From: Zhang Qiang > > Add a missing spinlock protection for play_queue, because > the play_queue may be destroyed when the "playback_work" > work func and "f_audio_out_ep_complete" callback func > operate this paly_queue at the same time. "play_queue", right? > > Cc: stable > Signed-off-by: Zhang Qiang Because you do not have a Fixes: tag in here, how far back do you want the stable patch to go to? That's why, if you can, it's always good to have a "Fixes:" tag in there to show what commit caused the problem you are fixing here. So, what commit caused this? thanks, gre gk-h
回复: [PATCH v3] kthread: Work could not be queued when worker being destroyed
I'm very sorry that there are some problems with my change. as follows: [1.203300] loop: module loaded [1.204599] megasas: 07.714.04.00-rc1 [1.211124] spi_qup 78b7000.spi: IN:block:16, fifo:64, OUT:block:16, fifo:64 [1.211509] [ cut here ] [1.217238] WARNING: CPU: 0 PID: 1 at kernel/kthread.c:819 kthread_queue_work+0x90/0xa0 [1.221832] Modules linked in: [1.229554] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.0-rc3-next-20200706 #1 [1.232683] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [1.240237] pstate: 4085 (nZcv daIf -PAN -UAO BTYPE=--) [1.246918] pc : kthread_queue_work+0x90/0xa0 [1.252211] lr : kthread_queue_work+0x2c/0xa0 [1.256722] sp : 80001002ba50 [1.261061] x29: 80001002ba50 x28: 3b868000 [1.264363] x27: 3fcf63c0 x26: 3b868680 [1.269744] x25: 3b868400 x24: 3d116810 [1.275039] x23: 800012025304 x22: 3b8683bc [1.280335] x21: x20: 3b8683f8 [1.285630] x19: 3b8683b8 x18: [1.290925] x17: x16: 800011167420 [1.296220] x15: 0eb90480 x14: 0267 [1.301515] x13: 0004 x12: [1.306810] x11: x10: 0003 [1.312105] x9 : 3fcbac10 x8 : 3fcba240 [1.317400] x7 : 3bc3c800 x6 : 0003 [1.322696] x5 : x4 : [1.327991] x3 : 3b8683bc x2 : 0001 [1.333285] x1 : x0 : [1.338583] Call trace: [1.343875] kthread_queue_work+0x90/0xa0 [1.346050] spi_start_queue+0x50/0x78 [1.350213] spi_register_controller+0x458/0x820 [1.353860] devm_spi_register_controller+0x44/0xa0 [1.358638] spi_qup_probe+0x5d8/0x638 [1.363235] platform_drv_probe+0x54/0xa8 [1.367053] really_probe+0xd8/0x320 [1.371133] driver_probe_device+0x58/0xb8 [1.374779] device_driver_attach+0x74/0x80 [1.378685] __driver_attach+0x58/0xe0 [1.382766] bus_for_each_dev+0x70/0xc0 [1.386583] driver_attach+0x24/0x30 [1.390317] bus_add_driver+0x14c/0x1f0 [1.394137] driver_register+0x64/0x120 [1.397696] __platform_driver_register+0x48/0x58 [1.401519] spi_qup_driver_init+0x1c/0x28 [1.406378] do_one_initcall+0x54/0x1a0 [1.410372] kernel_init_freeable+0x1d4/0x254 [1.414106] kernel_init+0x14/0x110 [1.418616] ret_from_fork+0x10/0x34 [1.421918] ---[ end trace 4b59f327623c9e10 ]--- [1.426526] spi_qup 78b9000.spi: IN:block:16, fifo:64, OUT:block:16, fifo:64 [1.430721] [ cut here ] [1.437374] WARNING: CPU: 0 PID: 1 at kernel/kthread.c:819 when in "spi_init_queue" func : "kthread_init_worker(>kworker); (worker->task = NULL) ctlr->kworker_task = kthread_run(kthread_worker_fn, >kworker, "%s", dev_name(>dev)); " in "spi_start_queue" func: "kthread_queue_work(>kworker, >pump_messages);" Becasue the kthread_worker_fn is not begin running, if queue work to worker, the "!worker->task" = true, trigger WARN. ________ 发件人: Tejun Heo 代表 Tejun Heo 发送时间: 2020年7月6日 22:59 收件人: Zhang, Qiang 抄送: ben.do...@codethink.co.uk; bfie...@redhat.com; c...@rock-chips.com; pet...@infradead.org; pmla...@suse.com; a...@linux-foundation.org; mm-comm...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH v3] kthread: Work could not be queued when worker being destroyed On Sun, Jul 05, 2020 at 09:30:18AM +0800, qiang.zh...@windriver.com wrote: > From: Zhang Qiang > > Before the work is put into the queue of the worker thread, > the state of the worker thread needs to be detected,because > the worker thread may be in the destruction state at this time. > > Signed-off-by: Zhang Qiang > Suggested-by: Petr Mladek > Reviewed-by: Petr Mladek Andrew already brought this up but can you please provide some context on why you're making this change? Thanks. -- tejun
回复: [PATCH v2] usb: gadget: function: fix missing spinlock in f_uac1_legacy
Thanks for your suggestin Greg KH I think there is not need fix tags. I will resend. thanks, Zhang Qiang 发件人: Greg KH 发送时间: 2020年7月6日 18:31 收件人: Zhang, Qiang 抄送: ba...@kernel.org; colin.k...@canonical.com; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH v2] usb: gadget: function: fix missing spinlock in f_uac1_legacy On Sun, Jul 05, 2020 at 02:16:16PM +0800, qiang.zh...@windriver.com wrote: > From: Zhang Qiang > > Add a missing spinlock protection to the add operation of the > "audio->play_queue" > in "f_audio_out_ep_complete" function. That says _what_ you did, but not _why_ you did that. Why is a lock needed here? What does this protect? What kernel commit does this "fix"? Put that in the "Fixes:" line, and probably you need a "cc: stable" in that area too, right? thanks, greg k-h
回复: [kthread] a90477f0c9: WARNING:at_kernel/kthread.c:#kthread_queue_work
发件人: Zhang, Qiang 发送时间: 2020年7月6日 18:17 收件人: Petr Mladek 抄送: Andrew Morton; linux-kernel@vger.kernel.org 主题: 回复: [kthread] a90477f0c9: WARNING:at_kernel/kthread.c:#kthread_queue_work Hi , Petr Mladek There some question for "Work could not be queued when worker being destroyed" patch, when in "spi_init_queue" func : "kthread_init_worker(>kworker); (worker->task = NULL) ctlr->kworker_task = kthread_run(kthread_worker_fn, >kworker, "%s", dev_name(>dev)); " in "spi_start_queue" func: "kthread_queue_work(>kworker, >pump_messages);" Becasue the kthread_worker_fn is not begin running, if queue work to worker, the "!worker->task" = true, trigger WARN. Are Need to " test_bit(KTHREAD_SHOULD_STOP, _kthread(current)->flags) " replace "WARN_ON(!worker->task)" in queuing_blocked func or "kthread_create_worker" replace "kthread_run(kthread_worker_fn.." in spi_init_queue func (because in kthread_create_worker, "worker->task" will be assigned a value) ? Zhang Qiang 发件人: kernel test robot 发送时间: 2020年7月6日 17:38 收件人: Zhang, Qiang 抄送: l...@lists.01.org 主题: [kthread] a90477f0c9: WARNING:at_kernel/kthread.c:#kthread_queue_work Greeting, FYI, we noticed the following commit (built with gcc-7): commit: a90477f0c956621eb0dd69f0abfb6066ad8fbef7 ("kthread: work could not be queued when worker being destroyed") https://github.com/hnaz/linux-mm master in testcase: trinity with following parameters: runtime: 300s test-description: Trinity is a linux system call fuzz tester. test-url: http://codemonkey.org.uk/projects/trinity/ on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 16G caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +-+++ | | 85c6127e06 | a90477f0c9 | +-+++ | boot_successes | 6 | 0 | | boot_failures | 0 | 30 | | WARNING:at_kernel/kthread.c:#kthread_queue_work | 0 | 30 | | EIP:kthread_queue_work | 0 | 30 | | BUG:kernel_hang_in_test_stage | 0 | 2 | +-+++ If you fix the issue, kindly add following tag Reported-by: kernel test robot [5.554282] WARNING: CPU: 0 PID: 1 at kernel/kthread.c:817 kthread_queue_work+0xf8/0x120 [5.556204] Modules linked in: [5.556204] CPU: 0 PID: 1 Comm: swapper Tainted: G S 5.8.0-rc3-00014-ga90477f0c9566 #1 [5.556204] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [5.556204] EIP: kthread_queue_work+0xf8/0x120 [5.556204] Code: 00 59 e9 67 ff ff ff 8d 76 00 8b 4e 10 85 c9 75 a6 8d 4b 28 89 f2 89 d8 bf 01 00 00 00 e8 f0 f5 ff ff eb 93 8d b6 00 00 00 00 <0f> 0b 6a 00 31 c9 ba 01 00 00 00 b8 08 6c 64 c3 e8 f3 04 0b 00 5b [5.556204] EAX: EBX: eeff538c ECX: EDX: 0001 [5.556204] ESI: eeff53d0 EDI: EBP: f5edde70 ESP: f5edde5c [5.556204] DS: 007b ES: 007b FS: GS: SS: 0068 EFLAGS: 00010046 [5.556204] CR0: 80050033 CR2: b7eda844 CR3: 038ae000 CR4: 000406d0 [5.556204] Call Trace: [5.556204] spi_start_queue+0x50/0x70 [5.556204] spi_register_controller+0x642/0xa80 [5.556204] ? kobject_get+0x54/0xf0 [5.556204] ? parport_pc_platform_probe+0x10/0x10 [5.556204] spi_bitbang_start+0x2f/0x70 [5.556204] ? parport_pc_platform_probe+0x10/0x10 [5.556204] butterfly_attach+0x164/0x2c0 [5.556204] ? driver_detach+0x30/0x30 [5.556204] port_check+0x1c/0x30 [5.556204] bus_for_each_dev+0x5a/0x90 [5.556204] __parport_register_driver+0x76/0xa0 [5.556204] ? driver_detach+0x30/0x30 [5.556204] ? spi_engine_driver_init+0x16/0x16 [5.556204] butterfly_init+0x19/0x1b [5.556204] do_one_initcall+0x79/0x310 [5.556204] ? parse_args+0x70/0x420 [5.556204] ? rcu_read_lock_sched_held+0x2f/0x50 [5.556204] ? trace_initcall_level+0x95/0xc7 [5.556204] ? kernel_init_freeable+0x129/0x19f [5.556204] kernel_init_freeable+0x148/0x19f [5.556204] ? rest_init+0x100/0x100 [5.556204] kernel_init+0xd/0xf0 [5.556204] ret_from_fork+0x1c/0x28 [5.556204] irq event stamp: 8620410 [5.556204] hardirqs last enabled at (8620409): [] _raw_spin_unlock_irqrestore+0x2a/0x50 [5.556204] hardirqs last disabled at (8620410): [] _raw_spin_lock_irqsave+0
回复: [PATCH] usb: gadget: function: fix missing spinlock in f_uac1_legacy
Sorry, I will add changelog and resend. Zhang Qiang 发件人: Greg KH 发送时间: 2020年7月6日 15:40 收件人: Zhang, Qiang 抄送: ba...@kernel.org; colin.k...@canonical.com; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] usb: gadget: function: fix missing spinlock in f_uac1_legacy On Sun, Jul 05, 2020 at 09:59:41AM +0800, qiang.zh...@windriver.com wrote: > From: Zhang Qiang > > Signed-off-by: Zhang Qiang No changelog text? I can't take patches like that, sorry. greg k-h
回复: [PATCH] usb: gadget: function: fix missing spinlock in f_uac1_legacy
发件人: linux-kernel-ow...@vger.kernel.org 代表 qiang.zh...@windriver.com 发送时间: 2020年7月5日 9:59 收件人: ba...@kernel.org 抄送: gre...@linuxfoundation.org; colin.k...@canonical.com; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: [PATCH] usb: gadget: function: fix missing spinlock in f_uac1_legacy From: Zhang Qiang Signed-off-by: Zhang Qiang --- drivers/usb/gadget/function/f_uac1_legacy.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/usb/gadget/function/f_uac1_legacy.c b/drivers/usb/gadget/function/f_uac1_legacy.c index 349deae7cabd..e2d7f69128a0 100644 --- a/drivers/usb/gadget/function/f_uac1_legacy.c +++ b/drivers/usb/gadget/function/f_uac1_legacy.c @@ -336,7 +336,9 @@ static int f_audio_out_ep_complete(struct usb_ep *ep, struct usb_request *req) /* Copy buffer is full, add it to the play_queue */ if (audio_buf_size - copy_buf->actual < req->actual) { + spin_lock_irq(>lock); list_add_tail(_buf->list, >play_queue); + spin_unlock_irq(>lock); schedule_work(>playback_work); copy_buf = f_audio_buffer_alloc(audio_buf_size); if (IS_ERR(copy_buf)) -- 2.24.1
回复: [PATCH] kthread: Don't cancel a work that is being cancelled
Thank you for your advice, if add kthread_cancel_work() without the "_sync" it is be dangerous, But I think it is unnecessary to cancel work which is be canceling. 发件人: linux-kernel-ow...@vger.kernel.org 代表 Petr Mladek 发送时间: 2020年7月3日 15:28 收件人: Zhang, Qiang 抄送: ben.do...@codethink.co.uk; bfie...@redhat.com; c...@rock-chips.com; pet...@infradead.org; t...@kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] kthread: Don't cancel a work that is being cancelled On Thu 2020-07-02 12:43:24, qiang.zh...@windriver.com wrote: > From: Zhang Qiang > > When canceling a work, if it is found that the work is in > the cancelling state, we should directly exit the cancelled > operation. No, the function guarantees that the work is not longer running when it returns. This is why it has the suffix "_sync" in the name. We would need to add kthread_cancel_work() without the "_sync" wrappers that would not wait for the work in progress. But it might be dangerous. The API users usually want to make sure that the work in not longer running to avoid races. What is the use case for the non-sync behavior, please? Best Regards, Petr
回复: [PATCH] usb: gadget: function: printer: The device interface is reset and should return error code
Hi Felipe, Please check this patch and make suggestions . Thanks Zqiang 发件人: linux-usb-ow...@vger.kernel.org 代表 qiang.zh...@windriver.com 发送时间: 2020年6月28日 9:57 收件人: felipe.ba...@linux.intel.com 抄送: gre...@linuxfoundation.org; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: [PATCH] usb: gadget: function: printer: The device interface is reset and should return error code From: Zqiang After the device is disconnected from the host side, the interface of the device is reset. If the userspace operates the device again, an error code should be returned. Signed-off-by: Zqiang --- drivers/usb/gadget/function/f_printer.c | 36 + 1 file changed, 36 insertions(+) diff --git a/drivers/usb/gadget/function/f_printer.c b/drivers/usb/gadget/function/f_printer.c index 9c7ed2539ff7..2b45a61e4213 100644 --- a/drivers/usb/gadget/function/f_printer.c +++ b/drivers/usb/gadget/function/f_printer.c @@ -338,6 +338,11 @@ printer_open(struct inode *inode, struct file *fd) spin_lock_irqsave(>lock, flags); + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + return -ENODEV; + } + if (!dev->printer_cdev_open) { dev->printer_cdev_open = 1; fd->private_data = dev; @@ -430,6 +435,12 @@ printer_read(struct file *fd, char __user *buf, size_t len, loff_t *ptr) mutex_lock(>lock_printer_io); spin_lock_irqsave(>lock, flags); + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + mutex_unlock(>lock_printer_io); + return -ENODEV; + } + /* We will use this flag later to check if a printer reset happened * after we turn interrupts back on. */ @@ -561,6 +572,12 @@ printer_write(struct file *fd, const char __user *buf, size_t len, loff_t *ptr) mutex_lock(>lock_printer_io); spin_lock_irqsave(>lock, flags); + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + mutex_unlock(>lock_printer_io); + return -ENODEV; + } + /* Check if a printer reset happens while we have interrupts on */ dev->reset_printer = 0; @@ -667,6 +684,13 @@ printer_fsync(struct file *fd, loff_t start, loff_t end, int datasync) inode_lock(inode); spin_lock_irqsave(>lock, flags); + + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + inode_unlock(inode); + return -ENODEV; + } + tx_list_empty = (likely(list_empty(>tx_reqs))); spin_unlock_irqrestore(>lock, flags); @@ -689,6 +713,13 @@ printer_poll(struct file *fd, poll_table *wait) mutex_lock(>lock_printer_io); spin_lock_irqsave(>lock, flags); + + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + mutex_unlock(>lock_printer_io); + return EPOLLERR | EPOLLHUP; + } + setup_rx_reqs(dev); spin_unlock_irqrestore(>lock, flags); mutex_unlock(>lock_printer_io); @@ -722,6 +753,11 @@ printer_ioctl(struct file *fd, unsigned int code, unsigned long arg) spin_lock_irqsave(>lock, flags); + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + return -ENODEV; + } + switch (code) { case GADGET_GET_PRINTER_STATUS: status = (int)dev->printer_status; -- 2.24.1
回复: [PATCH] usb: gadget: function: printer: Add gadget dev interface status judgment
发件人: linux-usb-ow...@vger.kernel.org 代表 qiang.zh...@windriver.com 发送时间: 2020年6月15日 17:46 收件人: ba...@kernel.org 抄送: gre...@linuxfoundation.org; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: [PATCH] usb: gadget: function: printer: Add gadget dev interface status judgment From: Zqiang After the interface of gadget printer device was disabled, We should not continue operate the device. Signed-off-by: Zqiang --- drivers/usb/gadget/function/f_printer.c | 36 + 1 file changed, 36 insertions(+) diff --git a/drivers/usb/gadget/function/f_printer.c b/drivers/usb/gadget/function/f_printer.c index 9c7ed2539ff7..2b45a61e4213 100644 --- a/drivers/usb/gadget/function/f_printer.c +++ b/drivers/usb/gadget/function/f_printer.c @@ -338,6 +338,11 @@ printer_open(struct inode *inode, struct file *fd) spin_lock_irqsave(>lock, flags); + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + return -ENODEV; + } + if (!dev->printer_cdev_open) { dev->printer_cdev_open = 1; fd->private_data = dev; @@ -430,6 +435,12 @@ printer_read(struct file *fd, char __user *buf, size_t len, loff_t *ptr) mutex_lock(>lock_printer_io); spin_lock_irqsave(>lock, flags); + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + mutex_unlock(>lock_printer_io); + return -ENODEV; + } + /* We will use this flag later to check if a printer reset happened * after we turn interrupts back on. */ @@ -561,6 +572,12 @@ printer_write(struct file *fd, const char __user *buf, size_t len, loff_t *ptr) mutex_lock(>lock_printer_io); spin_lock_irqsave(>lock, flags); + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + mutex_unlock(>lock_printer_io); + return -ENODEV; + } + /* Check if a printer reset happens while we have interrupts on */ dev->reset_printer = 0; @@ -667,6 +684,13 @@ printer_fsync(struct file *fd, loff_t start, loff_t end, int datasync) inode_lock(inode); spin_lock_irqsave(>lock, flags); + + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + inode_unlock(inode); + return -ENODEV; + } + tx_list_empty = (likely(list_empty(>tx_reqs))); spin_unlock_irqrestore(>lock, flags); @@ -689,6 +713,13 @@ printer_poll(struct file *fd, poll_table *wait) mutex_lock(>lock_printer_io); spin_lock_irqsave(>lock, flags); + + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + mutex_unlock(>lock_printer_io); + return EPOLLERR | EPOLLHUP; + } + setup_rx_reqs(dev); spin_unlock_irqrestore(>lock, flags); mutex_unlock(>lock_printer_io); @@ -722,6 +753,11 @@ printer_ioctl(struct file *fd, unsigned int code, unsigned long arg) spin_lock_irqsave(>lock, flags); + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + return -ENODEV; + } + switch (code) { case GADGET_GET_PRINTER_STATUS: status = (int)dev->printer_status; -- 2.24.1
回复: [PATCH] usb: gadget: function: printer: Add gadget dev interface status judgment
Hello, Greg KH Please have you review the patch? thanks Zqiang 发件人: linux-usb-ow...@vger.kernel.org 代表 qiang.zh...@windriver.com 发送时间: 2020年6月15日 17:46 收件人: ba...@kernel.org 抄送: gre...@linuxfoundation.org; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: [PATCH] usb: gadget: function: printer: Add gadget dev interface status judgment From: Zqiang After the interface of gadget printer device was disabled, We should not continue operate the device. Signed-off-by: Zqiang --- drivers/usb/gadget/function/f_printer.c | 36 + 1 file changed, 36 insertions(+) diff --git a/drivers/usb/gadget/function/f_printer.c b/drivers/usb/gadget/function/f_printer.c index 9c7ed2539ff7..2b45a61e4213 100644 --- a/drivers/usb/gadget/function/f_printer.c +++ b/drivers/usb/gadget/function/f_printer.c @@ -338,6 +338,11 @@ printer_open(struct inode *inode, struct file *fd) spin_lock_irqsave(>lock, flags); + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + return -ENODEV; + } + if (!dev->printer_cdev_open) { dev->printer_cdev_open = 1; fd->private_data = dev; @@ -430,6 +435,12 @@ printer_read(struct file *fd, char __user *buf, size_t len, loff_t *ptr) mutex_lock(>lock_printer_io); spin_lock_irqsave(>lock, flags); + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + mutex_unlock(>lock_printer_io); + return -ENODEV; + } + /* We will use this flag later to check if a printer reset happened * after we turn interrupts back on. */ @@ -561,6 +572,12 @@ printer_write(struct file *fd, const char __user *buf, size_t len, loff_t *ptr) mutex_lock(>lock_printer_io); spin_lock_irqsave(>lock, flags); + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + mutex_unlock(>lock_printer_io); + return -ENODEV; + } + /* Check if a printer reset happens while we have interrupts on */ dev->reset_printer = 0; @@ -667,6 +684,13 @@ printer_fsync(struct file *fd, loff_t start, loff_t end, int datasync) inode_lock(inode); spin_lock_irqsave(>lock, flags); + + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + inode_unlock(inode); + return -ENODEV; + } + tx_list_empty = (likely(list_empty(>tx_reqs))); spin_unlock_irqrestore(>lock, flags); @@ -689,6 +713,13 @@ printer_poll(struct file *fd, poll_table *wait) mutex_lock(>lock_printer_io); spin_lock_irqsave(>lock, flags); + + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + mutex_unlock(>lock_printer_io); + return EPOLLERR | EPOLLHUP; + } + setup_rx_reqs(dev); spin_unlock_irqrestore(>lock, flags); mutex_unlock(>lock_printer_io); @@ -722,6 +753,11 @@ printer_ioctl(struct file *fd, unsigned int code, unsigned long arg) spin_lock_irqsave(>lock, flags); + if (dev->interface < 0) { + spin_unlock_irqrestore(>lock, flags); + return -ENODEV; + } + switch (code) { case GADGET_GET_PRINTER_STATUS: status = (int)dev->printer_status; -- 2.24.1
Re: [PATCH] usb: gadget: function: printer: fix use-after-free in __lock_acquire
I cannot find a reference count for this structure(printer_dev). In this scenario: When the Character device is still open, if you operate the device through configfs and execute commands like unlink, the resources allocated when the device is bound will be released(printer_dev). After that, if you perform ioctl operation again, it will appear use-after-free. add kref to show that this resource(printer_dev) will also use, until this character device is close. Similar problems can occur in f_hid.c. As for the other gadget drivers, I haven't seen their logical implementation,so I'm not sure thanks, Zqiang On 6/18/20 4:30 PM, Greg KH wrote: On Fri, Jun 05, 2020 at 11:56:52AM +0800, qiang.zh...@windriver.com wrote: From: Zqiang Fix this by increase object reference count. BUG: KASAN: use-after-free in __lock_acquire+0x3fd4/0x4180 kernel/locking/lockdep.c:3831 Read of size 8 at addr 8880683b0018 by task syz-executor.0/3377 CPU: 1 PID: 3377 Comm: syz-executor.0 Not tainted 5.6.11 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xce/0x128 lib/dump_stack.c:118 print_address_description.constprop.4+0x21/0x3c0 mm/kasan/report.c:374 __kasan_report+0x131/0x1b0 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:641 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135 __lock_acquire+0x3fd4/0x4180 kernel/locking/lockdep.c:3831 lock_acquire+0x127/0x350 kernel/locking/lockdep.c:4488 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159 printer_ioctl+0x4a/0x110 drivers/usb/gadget/function/f_printer.c:723 vfs_ioctl fs/ioctl.c:47 [inline] ksys_ioctl+0xfb/0x130 fs/ioctl.c:763 __do_sys_ioctl fs/ioctl.c:772 [inline] __se_sys_ioctl fs/ioctl.c:770 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:770 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4531a9 Code: ed 60 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 60 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7fd14ad72c78 EFLAGS: 0246 ORIG_RAX: 0010 RAX: ffda RBX: 0073bfa8 RCX: 004531a9 RDX: fff9 RSI: 009e RDI: 0003 RBP: 0003 R08: R09: R10: R11: 0246 R12: 004bbd61 R13: 004d0a98 R14: 7fd14ad736d4 R15: Allocated by task 2393: save_stack+0x21/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc.constprop.3+0xa7/0xd0 mm/kasan/common.c:515 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:529 kmem_cache_alloc_trace+0xfa/0x2d0 mm/slub.c:2813 kmalloc include/linux/slab.h:555 [inline] kzalloc include/linux/slab.h:669 [inline] gprinter_alloc+0xa1/0x870 drivers/usb/gadget/function/f_printer.c:1416 usb_get_function+0x58/0xc0 drivers/usb/gadget/functions.c:61 config_usb_cfg_link+0x1ed/0x3e0 drivers/usb/gadget/configfs.c:444 configfs_symlink+0x527/0x11d0 fs/configfs/symlink.c:202 vfs_symlink+0x33d/0x5b0 fs/namei.c:4201 do_symlinkat+0x11b/0x1d0 fs/namei.c:4228 __do_sys_symlinkat fs/namei.c:4242 [inline] __se_sys_symlinkat fs/namei.c:4239 [inline] __x64_sys_symlinkat+0x73/0xb0 fs/namei.c:4239 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 3368: save_stack+0x21/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:337 [inline] __kasan_slab_free+0x135/0x190 mm/kasan/common.c:476 kasan_slab_free+0xe/0x10 mm/kasan/common.c:485 slab_free_hook mm/slub.c:1444 [inline] slab_free_freelist_hook mm/slub.c:1477 [inline] slab_free mm/slub.c:3034 [inline] kfree+0xf7/0x410 mm/slub.c:3995 gprinter_free+0x49/0xd0 drivers/usb/gadget/function/f_printer.c:1353 usb_put_function+0x38/0x50 drivers/usb/gadget/functions.c:87 config_usb_cfg_unlink+0x2db/0x3b0 drivers/usb/gadget/configfs.c:485 configfs_unlink+0x3b9/0x7f0 fs/configfs/symlink.c:250 vfs_unlink+0x287/0x570 fs/namei.c:4073 do_unlinkat+0x4f9/0x620 fs/namei.c:4137 __do_sys_unlink fs/namei.c:4184 [inline] __se_sys_unlink fs/namei.c:4182 [inline] __x64_sys_unlink+0x42/0x50 fs/namei.c:4182 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at 8880683b which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 24 bytes inside of 1024-byte region [8880683b, 8880683b0400) The buggy address belongs to the page: page:ea0001a0ec00 refcount:1 mapcount:0 mapping:88806c00e300 index:0x8880683b1800 compound_mapcount: 0 flags:
回复: 回复: [PATCH v2] usb: gadget: function: printer: fix use-after-free in __lock_acquire
cdev object reference count and "struct printer_dev" object reference count(kref), This two reference counts do not conflict. in file usb-skeleton.c also used a similar method, "struct usb_skel" contains kref members. thanks, Zqiang 发件人: Greg KH 发送时间: 2020年6月9日 17:48 收件人: Zhang, Qiang 抄送: ba...@kernel.org; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: 回复: [PATCH v2] usb: gadget: function: printer: fix use-after-free in __lock_acquire A: http://en.wikipedia.org/wiki/Top_post Q: Were do I find info about this thing called top-posting? A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? A: No. Q: Should I include quotations after my reply? http://daringfireball.net/2007/07/on_top On Tue, Jun 09, 2020 at 09:35:11AM +, Zhang, Qiang wrote: > Thank you for your suggestion > two referenced counted objects in the same exact structure. another > referenced is > "dev->printer_cdev_open"? Maybe, I don't know, but a cdev does have a reference count already, right? Why do you need printer_cdev_open as well? thanks, greg k-h
回复: [PATCH v2] usb: gadget: function: printer: fix use-after-free in __lock_acquire
Thank you for your suggestion two referenced counted objects in the same exact structure. another referenced is "dev->printer_cdev_open"? 发件人: Greg KH 发送时间: 2020年6月8日 15:33 收件人: Zhang, Qiang 抄送: ba...@kernel.org; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH v2] usb: gadget: function: printer: fix use-after-free in __lock_acquire On Mon, Jun 08, 2020 at 03:16:22PM +0800, qiang.zh...@windriver.com wrote: > From: Zqiang > > Increase the reference count of the printer dev through kref to avoid > being released by other tasks when in use. > > BUG: KASAN: use-after-free in __lock_acquire+0x3fd4/0x4180 > kernel/locking/lockdep.c:3831 > Read of size 8 at addr 8880683b0018 by task syz-executor.0/3377 > > CPU: 1 PID: 3377 Comm: syz-executor.0 Not tainted 5.6.11 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0xce/0x128 lib/dump_stack.c:118 > print_address_description.constprop.4+0x21/0x3c0 mm/kasan/report.c:374 > __kasan_report+0x131/0x1b0 mm/kasan/report.c:506 > kasan_report+0x12/0x20 mm/kasan/common.c:641 > __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135 > __lock_acquire+0x3fd4/0x4180 kernel/locking/lockdep.c:3831 > lock_acquire+0x127/0x350 kernel/locking/lockdep.c:4488 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] > _raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159 > printer_ioctl+0x4a/0x110 drivers/usb/gadget/function/f_printer.c:723 > vfs_ioctl fs/ioctl.c:47 [inline] > ksys_ioctl+0xfb/0x130 fs/ioctl.c:763 > __do_sys_ioctl fs/ioctl.c:772 [inline] > __se_sys_ioctl fs/ioctl.c:770 [inline] > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:770 > do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x4531a9 > Code: ed 60 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d > 01 f0 ff ff 0f 83 bb 60 fc ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:7fd14ad72c78 EFLAGS: 0246 ORIG_RAX: 0010 > RAX: ffda RBX: 0073bfa8 RCX: 004531a9 > RDX: fff9 RSI: 009e RDI: 0003 > RBP: 0003 R08: R09: > R10: R11: 0246 R12: 004bbd61 > R13: 004d0a98 R14: 7fd14ad736d4 R15: > > Allocated by task 2393: > save_stack+0x21/0x90 mm/kasan/common.c:72 > set_track mm/kasan/common.c:80 [inline] > __kasan_kmalloc.constprop.3+0xa7/0xd0 mm/kasan/common.c:515 > kasan_kmalloc+0x9/0x10 mm/kasan/common.c:529 > kmem_cache_alloc_trace+0xfa/0x2d0 mm/slub.c:2813 > kmalloc include/linux/slab.h:555 [inline] > kzalloc include/linux/slab.h:669 [inline] > gprinter_alloc+0xa1/0x870 drivers/usb/gadget/function/f_printer.c:1416 > usb_get_function+0x58/0xc0 drivers/usb/gadget/functions.c:61 > config_usb_cfg_link+0x1ed/0x3e0 drivers/usb/gadget/configfs.c:444 > configfs_symlink+0x527/0x11d0 fs/configfs/symlink.c:202 > vfs_symlink+0x33d/0x5b0 fs/namei.c:4201 > do_symlinkat+0x11b/0x1d0 fs/namei.c:4228 > __do_sys_symlinkat fs/namei.c:4242 [inline] > __se_sys_symlinkat fs/namei.c:4239 [inline] > __x64_sys_symlinkat+0x73/0xb0 fs/namei.c:4239 > do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > Freed by task 3368: > save_stack+0x21/0x90 mm/kasan/common.c:72 > set_track mm/kasan/common.c:80 [inline] > kasan_set_free_info mm/kasan/common.c:337 [inline] > __kasan_slab_free+0x135/0x190 mm/kasan/common.c:476 > kasan_slab_free+0xe/0x10 mm/kasan/common.c:485 > slab_free_hook mm/slub.c:1444 [inline] > slab_free_freelist_hook mm/slub.c:1477 [inline] > slab_free mm/slub.c:3034 [inline] > kfree+0xf7/0x410 mm/slub.c:3995 > gprinter_free+0x49/0xd0 drivers/usb/gadget/function/f_printer.c:1353 > usb_put_function+0x38/0x50 drivers/usb/gadget/functions.c:87 > config_usb_cfg_unlink+0x2db/0x3b0 drivers/usb/gadget/configfs.c:485 > configfs_unlink+0x3b9/0x7f0 fs/configfs/symlink.c:250 > vfs_unlink+0x287/0x570 fs/namei.c:4073 > do_unlinkat+0x4f9/0x620 fs/namei.c:4137 > __do_sys_unlink fs/namei.c:4184 [inline] > __se_sys_unlink fs/namei.c:4182 [inline] > __x64_sys_unlink+0x42/0x50 fs/namei.c:4182 > do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > The buggy address belongs to the object at 8880683b > which belongs to the cache kmalloc-1k of size 1024 > The buggy address is located 24 bytes inside of > 1024-byte re
回复: [PATCH] usb: gadget: function: printer: Fix use-after-free in __lock_acquire()
Hi Markus. I don't need to add Fix tag to view the code. 发件人: Markus Elfring 发送时间: 2020年6月5日 16:57 收件人: Zhang, Qiang; linux-...@vger.kernel.org 抄送: kernel-janit...@vger.kernel.org; linux-kernel@vger.kernel.org; Alan Stern; Felipe Balbi; Greg Kroah-Hartman; Kyungtae Kim 主题: Re: [PATCH] usb: gadget: function: printer: Fix use-after-free in __lock_acquire() > Fix this by increase object reference count. I find this description incomplete according to the proposed changes. Would you like to add the tag “Fixes” to the commit message? Regards, Markus
回复: [PATCH v5] workqueue: Remove unnecessary kfree() call in rcu_free_wq()
Thanks for your guide. I will try to change the weakness of weak wording. 发件人: Zhang, Qiang 发送时间: 2020年5月28日 9:41 收件人: Markus Elfring; Tejun Heo; Lai Jiangshan 抄送: linux-kernel@vger.kernel.org; kernel-janit...@vger.kernel.org 主题: 回复: [PATCH v5] workqueue: Remove unnecessary kfree() call in rcu_free_wq() Thanks for your guide. I tried to change the weakness of weak wording 发件人: linux-kernel-ow...@vger.kernel.org 代表 Markus Elfring 发送时间: 2020年5月27日 16:20 收件人: Zhang, Qiang ; Tejun Heo ; Lai Jiangshan 抄送: linux-kernel@vger.kernel.org ; kernel-janit...@vger.kernel.org 主题: Re: [PATCH v5] workqueue: Remove unnecessary kfree() call in rcu_free_wq() > Thus delete this function call which became unnecessary with the referenced > software update. … > Suggested-by: Markus Elfring Would the tag “Co-developed-by” be more appropriate according to the patch review to achieve a more pleasing commit message? > v1->v2->v3->v4->v5: > Modify weakly submitted information. Now I wonder about your wording choice “weakly”. Regards, Markus
Re: [PATCH v3] workqueue: Fix double kfree for rescuer
Thank you reply There is something wrong with my description. is it feasible to describe as follows: The resucer is already free in "destroy_workqueue" and "wq->rescuer = NULL" was executed, but in "rcu_free_wq" it's release again (equivalent to kfree(NULL)), this is unnecessary, so should remove. On 5/26/20 4:56 PM, Lai Jiangshan wrote: > On Mon, May 25, 2020 at 5:22 PM wrote: >> >> From: Zhang Qiang >> >> The callback function "rcu_free_wq" could be called after memory >> was released for "rescuer" already, Thus delete a misplaced call >> of the function "kfree". > > Hello > > wq->rescuer is guaranteed to be NULL in rcu_free_wq() > since def98c84b6cd > ("workqueue: Fix spurious sanity check failures in destroy_workqueue()") > > And the resucer is already free in destroy_workqueue() > since 8efe1223d73c > ("workqueue: Fix missing kfree(rescuer) in destroy_workqueue()") > > The patch is a cleanup to remove a "kfree(NULL);". > But the changelog is misleading. > >> >> Fixes: 6ba94429c8e7 ("workqueue: Reorder sysfs code") > > It is totally unrelated. > >> Signed-off-by: Zhang Qiang >> --- >> v1->v2->v3: >> Only commit information modification. >> kernel/workqueue.c | 1 - >> 1 file changed, 1 deletion(-) >> >> diff --git a/kernel/workqueue.c b/kernel/workqueue.c >> index 891ccad5f271..a2451cdcd503 100644 >> --- a/kernel/workqueue.c >> +++ b/kernel/workqueue.c >> @@ -3491,7 +3491,6 @@ static void rcu_free_wq(struct rcu_head *rcu) >> else >> free_workqueue_attrs(wq->unbound_attrs); >> >> - kfree(wq->rescuer); >> kfree(wq); >> } >> >> -- >> 2.24.1 >>