Re: Warning in irq_work_queue_on()
On Fri, Sep 04, 2015 at 05:11:54PM +0200, Frederic Weisbecker wrote: > On Thu, Sep 03, 2015 at 09:58:40AM +0200, Peter Zijlstra wrote: > > On Thu, Sep 03, 2015 at 02:03:51AM +0200, Frederic Weisbecker wrote: > > > On Thu, Sep 03, 2015 at 12:24:27AM +0200, Peter Zijlstra wrote: > > > > On Wed, Sep 02, 2015 at 11:50:22PM +0200, Frederic Weisbecker wrote: > > > > > > > [ 875.703227] [] > > > > > > > tick_nohz_full_kick_cpu+0x44/0x50 > > > > > > > > > > It happens in nohz full, but I'm not sure the guilty is nohz full. > > > > > > > > > > The problem here is that wake_up_nohz_cpu() selects a CPU that is > > > > > offline. > > > > > > > > wake_up_nohz_cpu() doesn't do any such thing. Where does the selection > > > > logic live? > > > > > > Err, got confused with get_nohz_timer_target(). But yeah > > > wake_up_nohz_cpu() is > > > called with a CPU that is chosen by mod_timer() -> > > > get_nohz_timer_target(). > > > > > > > > > > > > But this shouldn't happen. Either it selects a CPU that is in the > > > > > domain tree, > > > > > and I suspect offline CPUs aren't supposed to be there, or it selects > > > > > the current > > > > > CPU. And if the CPU is offlined, it shouldn't be running some > > > > > kthread... > > > > > > > > Do no assume things like that.. always check with the active mask. > > > > > > Hmm, so perhaps we need something like this (makes me realize that > > > the is_housekeeping_cpu() passes the wrong argument, no issue in practice > > > since nohz full aren't in the domain tree but I still need to fix that > > > along). > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index 0902e4d..2c10a69 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -628,7 +628,7 @@ int get_nohz_timer_target(void) > > > > > > rcu_read_lock(); > > > for_each_domain(cpu, sd) { > > > - for_each_cpu(i, sched_domain_span(sd)) { > > > + for_each_cpu_and(i, sched_domain_span(sd), cpu_online_mask) { > > > > cpu_active_mask, we clear that when we start killing the cpu. online > > only gets cleared once the cpu is actually dead. > > So, after our discussion in IRC, I checked how domains are rebuild on hotplug > ops and it appears that partition_sched_domain() is called on CPU_DOWN_PREPARE > only. The CPU shouldn't be on the domain tree after that. > > (Correct me if I'm wrong, I really am not an expert in the domain handling > code. > As you said that we can't guarantee that a CPU in the domain tree is in the > cpu_online_mask, > I'm likely wrong somewhere). > > This is then followed by synchronize_sched(). Which means that after that, the > new version of the CPU domains (with the offlining CPU excluded) is visible > everywhere while the CPU is still in cpu_online_mask. > > And finally stop machine runs and the CPU is cleared out of cpu_online_mask. > So I'm probably missing something, otherwise we could find a CPU in the domain > tree that is not in cpu_online_mask. OK, I have to ask... Should I be trying Frederic's patch? At the current failure rate, I will need to be running it for about a year to give any reasonable conclusion. :-/ Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Warning in irq_work_queue_on()
On Thu, Sep 03, 2015 at 09:58:40AM +0200, Peter Zijlstra wrote: > On Thu, Sep 03, 2015 at 02:03:51AM +0200, Frederic Weisbecker wrote: > > On Thu, Sep 03, 2015 at 12:24:27AM +0200, Peter Zijlstra wrote: > > > On Wed, Sep 02, 2015 at 11:50:22PM +0200, Frederic Weisbecker wrote: > > > > > > [ 875.703227] [] > > > > > > tick_nohz_full_kick_cpu+0x44/0x50 > > > > > > > > It happens in nohz full, but I'm not sure the guilty is nohz full. > > > > > > > > The problem here is that wake_up_nohz_cpu() selects a CPU that is > > > > offline. > > > > > > wake_up_nohz_cpu() doesn't do any such thing. Where does the selection > > > logic live? > > > > Err, got confused with get_nohz_timer_target(). But yeah wake_up_nohz_cpu() > > is > > called with a CPU that is chosen by mod_timer() -> get_nohz_timer_target(). > > > > > > > > > But this shouldn't happen. Either it selects a CPU that is in the > > > > domain tree, > > > > and I suspect offline CPUs aren't supposed to be there, or it selects > > > > the current > > > > CPU. And if the CPU is offlined, it shouldn't be running some kthread... > > > > > > Do no assume things like that.. always check with the active mask. > > > > Hmm, so perhaps we need something like this (makes me realize that > > the is_housekeeping_cpu() passes the wrong argument, no issue in practice > > since nohz full aren't in the domain tree but I still need to fix that > > along). > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 0902e4d..2c10a69 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -628,7 +628,7 @@ int get_nohz_timer_target(void) > > > > rcu_read_lock(); > > for_each_domain(cpu, sd) { > > - for_each_cpu(i, sched_domain_span(sd)) { > > + for_each_cpu_and(i, sched_domain_span(sd), cpu_online_mask) { > > cpu_active_mask, we clear that when we start killing the cpu. online > only gets cleared once the cpu is actually dead. So, after our discussion in IRC, I checked how domains are rebuild on hotplug ops and it appears that partition_sched_domain() is called on CPU_DOWN_PREPARE only. The CPU shouldn't be on the domain tree after that. (Correct me if I'm wrong, I really am not an expert in the domain handling code. As you said that we can't guarantee that a CPU in the domain tree is in the cpu_online_mask, I'm likely wrong somewhere). This is then followed by synchronize_sched(). Which means that after that, the new version of the CPU domains (with the offlining CPU excluded) is visible everywhere while the CPU is still in cpu_online_mask. And finally stop machine runs and the CPU is cleared out of cpu_online_mask. So I'm probably missing something, otherwise we could find a CPU in the domain tree that is not in cpu_online_mask. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Warning in irq_work_queue_on()
On Thu, Sep 03, 2015 at 02:03:51AM +0200, Frederic Weisbecker wrote: > On Thu, Sep 03, 2015 at 12:24:27AM +0200, Peter Zijlstra wrote: > > On Wed, Sep 02, 2015 at 11:50:22PM +0200, Frederic Weisbecker wrote: > > > > > [ 875.703227] [] tick_nohz_full_kick_cpu+0x44/0x50 > > > > > > It happens in nohz full, but I'm not sure the guilty is nohz full. > > > > > > The problem here is that wake_up_nohz_cpu() selects a CPU that is offline. > > > > wake_up_nohz_cpu() doesn't do any such thing. Where does the selection > > logic live? > > Err, got confused with get_nohz_timer_target(). But yeah wake_up_nohz_cpu() is > called with a CPU that is chosen by mod_timer() -> get_nohz_timer_target(). > > > > > > But this shouldn't happen. Either it selects a CPU that is in the domain > > > tree, > > > and I suspect offline CPUs aren't supposed to be there, or it selects the > > > current > > > CPU. And if the CPU is offlined, it shouldn't be running some kthread... > > > > Do no assume things like that.. always check with the active mask. > > Hmm, so perhaps we need something like this (makes me realize that > the is_housekeeping_cpu() passes the wrong argument, no issue in practice > since nohz full aren't in the domain tree but I still need to fix that along). > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 0902e4d..2c10a69 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -628,7 +628,7 @@ int get_nohz_timer_target(void) > > rcu_read_lock(); > for_each_domain(cpu, sd) { > - for_each_cpu(i, sched_domain_span(sd)) { > + for_each_cpu_and(i, sched_domain_span(sd), cpu_online_mask) { cpu_active_mask, we clear that when we start killing the cpu. online only gets cleared once the cpu is actually dead. > if (!idle_cpu(i) && is_housekeeping_cpu(cpu)) { > cpu = i; > goto unlock; > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Warning in irq_work_queue_on()
On Thu, Sep 03, 2015 at 12:24:27AM +0200, Peter Zijlstra wrote: > On Wed, Sep 02, 2015 at 11:50:22PM +0200, Frederic Weisbecker wrote: > > > > [ 875.703227] [] tick_nohz_full_kick_cpu+0x44/0x50 > > > > It happens in nohz full, but I'm not sure the guilty is nohz full. > > > > The problem here is that wake_up_nohz_cpu() selects a CPU that is offline. > > wake_up_nohz_cpu() doesn't do any such thing. Where does the selection > logic live? Err, got confused with get_nohz_timer_target(). But yeah wake_up_nohz_cpu() is called with a CPU that is chosen by mod_timer() -> get_nohz_timer_target(). > > > But this shouldn't happen. Either it selects a CPU that is in the domain > > tree, > > and I suspect offline CPUs aren't supposed to be there, or it selects the > > current > > CPU. And if the CPU is offlined, it shouldn't be running some kthread... > > Do no assume things like that.. always check with the active mask. Hmm, so perhaps we need something like this (makes me realize that the is_housekeeping_cpu() passes the wrong argument, no issue in practice since nohz full aren't in the domain tree but I still need to fix that along). diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0902e4d..2c10a69 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -628,7 +628,7 @@ int get_nohz_timer_target(void) rcu_read_lock(); for_each_domain(cpu, sd) { - for_each_cpu(i, sched_domain_span(sd)) { + for_each_cpu_and(i, sched_domain_span(sd), cpu_online_mask) { if (!idle_cpu(i) && is_housekeeping_cpu(cpu)) { cpu = i; goto unlock; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Warning in irq_work_queue_on()
On Wed, Sep 02, 2015 at 11:50:22PM +0200, Frederic Weisbecker wrote: > > > [ 875.703227] [] tick_nohz_full_kick_cpu+0x44/0x50 > > It happens in nohz full, but I'm not sure the guilty is nohz full. > > The problem here is that wake_up_nohz_cpu() selects a CPU that is offline. wake_up_nohz_cpu() doesn't do any such thing. Where does the selection logic live? > But this shouldn't happen. Either it selects a CPU that is in the domain tree, > and I suspect offline CPUs aren't supposed to be there, or it selects the > current > CPU. And if the CPU is offlined, it shouldn't be running some kthread... Do no assume things like that.. always check with the active mask. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Warning in irq_work_queue_on()
On Wed, Sep 02, 2015 at 03:44:05PM -0400, Tejun Heo wrote: > (cc'ing peterz) > > Ooh, this is from irq_work which doesn't have much to do with > workqueue. Peter? > > On Mon, Aug 24, 2015 at 05:16:11PM -0700, Paul E. McKenney wrote: > > Hello, Tejun, > > > > As discussed last week, I am getting an occasional warning out of > > irq_work_queue_on() WARN_ON_ONCE(cpu_is_offline(cpu)). The repeat-by > > seems to be a week or so of rcutorture runs on 16-CPU KVM instances > > on x86. So please see below on the off-chance that this is of use. > > I have also attached a .config file. > > > > Thoughts? > > > > Thanx, Paul > > > > > > > > [ 875.702254] [ cut here ] > > [ 875.703111] WARNING: CPU: 0 PID: 768 at > > /home/paulmck/public_git/bisect-linux-rcu/kernel/irq_work.c:69 > > irq_work_queue_on+0xd4/0x110() > > [ 875.703227] Modules linked in: > > [ 875.703227] CPU: 0 PID: 768 Comm: rcu_torture_rea Tainted: GW > >4.1.0-rc4+ #1 > > [ 875.703227] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > Bochs 01/01/2011 > > [ 875.703227] 81baadd8 88001dc5fce8 81895418 > > 00aa > > [ 875.703227] 88001dc5fd28 810517d5 > > 00015bc0 > > [ 875.703227] 0004 0004 88001fc8f980 > > 88001fc8d500 > > [ 875.703227] Call Trace: > > [ 875.703227] [] dump_stack+0x45/0x57 > > [ 875.703227] [] warn_slowpath_common+0x85/0xc0 > > [ 875.703227] [] warn_slowpath_null+0x15/0x20 > > [ 875.703227] [] irq_work_queue_on+0xd4/0x110 > > [ 875.703227] [] tick_nohz_full_kick_cpu+0x44/0x50 It happens in nohz full, but I'm not sure the guilty is nohz full. The problem here is that wake_up_nohz_cpu() selects a CPU that is offline. But this shouldn't happen. Either it selects a CPU that is in the domain tree, and I suspect offline CPUs aren't supposed to be there, or it selects the current CPU. And if the CPU is offlined, it shouldn't be running some kthread... > > [ 875.703227] [] wake_up_nohz_cpu+0xb4/0x100 > > [ 875.703227] [] internal_add_timer+0x86/0xa0 > > [ 875.703227] [] mod_timer+0xf1/0x1e0 > > [ 875.703227] [] rcu_torture_reader+0x2a4/0x2e0 > > [ 875.703227] [] ? rcu_torture_reader+0x2e0/0x2e0 > > [ 875.703227] [] ? > > rcutorture_trace_dump.part.10+0x20/0x20 > > [ 875.703227] [] kthread+0xcd/0xf0 > > [ 875.703227] [] ? kthread_create_on_node+0x180/0x180 > > [ 875.703227] [] ret_from_fork+0x42/0x70 > > [ 875.703227] [] ? kthread_create_on_node+0x180/0x180 > > [ 875.703227] ---[ end trace 74175128740d0113 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Warning in irq_work_queue_on()
(cc'ing peterz) Ooh, this is from irq_work which doesn't have much to do with workqueue. Peter? On Mon, Aug 24, 2015 at 05:16:11PM -0700, Paul E. McKenney wrote: > Hello, Tejun, > > As discussed last week, I am getting an occasional warning out of > irq_work_queue_on() WARN_ON_ONCE(cpu_is_offline(cpu)). The repeat-by > seems to be a week or so of rcutorture runs on 16-CPU KVM instances > on x86. So please see below on the off-chance that this is of use. > I have also attached a .config file. > > Thoughts? > > Thanx, Paul > > > > [ 875.702254] [ cut here ] > [ 875.703111] WARNING: CPU: 0 PID: 768 at > /home/paulmck/public_git/bisect-linux-rcu/kernel/irq_work.c:69 > irq_work_queue_on+0xd4/0x110() > [ 875.703227] Modules linked in: > [ 875.703227] CPU: 0 PID: 768 Comm: rcu_torture_rea Tainted: GW > 4.1.0-rc4+ #1 > [ 875.703227] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > Bochs 01/01/2011 > [ 875.703227] 81baadd8 88001dc5fce8 81895418 > 00aa > [ 875.703227] 88001dc5fd28 810517d5 > 00015bc0 > [ 875.703227] 0004 0004 88001fc8f980 > 88001fc8d500 > [ 875.703227] Call Trace: > [ 875.703227] [] dump_stack+0x45/0x57 > [ 875.703227] [] warn_slowpath_common+0x85/0xc0 > [ 875.703227] [] warn_slowpath_null+0x15/0x20 > [ 875.703227] [] irq_work_queue_on+0xd4/0x110 > [ 875.703227] [] tick_nohz_full_kick_cpu+0x44/0x50 > [ 875.703227] [] wake_up_nohz_cpu+0xb4/0x100 > [ 875.703227] [] internal_add_timer+0x86/0xa0 > [ 875.703227] [] mod_timer+0xf1/0x1e0 > [ 875.703227] [] rcu_torture_reader+0x2a4/0x2e0 > [ 875.703227] [] ? rcu_torture_reader+0x2e0/0x2e0 > [ 875.703227] [] ? rcutorture_trace_dump.part.10+0x20/0x20 > [ 875.703227] [] kthread+0xcd/0xf0 > [ 875.703227] [] ? kthread_create_on_node+0x180/0x180 > [ 875.703227] [] ret_from_fork+0x42/0x70 > [ 875.703227] [] ? kthread_create_on_node+0x180/0x180 > [ 875.703227] ---[ end trace 74175128740d0113 ]--- > # > # Automatically generated file; DO NOT EDIT. > # Linux/x86 4.1.0-rc4 Kernel Configuration > # > CONFIG_64BIT=y > CONFIG_X86_64=y > CONFIG_X86=y > CONFIG_INSTRUCTION_DECODER=y > CONFIG_PERF_EVENTS_INTEL_UNCORE=y > CONFIG_OUTPUT_FORMAT="elf64-x86-64" > CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" > CONFIG_LOCKDEP_SUPPORT=y > CONFIG_STACKTRACE_SUPPORT=y > CONFIG_HAVE_LATENCYTOP_SUPPORT=y > CONFIG_MMU=y > CONFIG_NEED_DMA_MAP_STATE=y > CONFIG_NEED_SG_DMA_LENGTH=y > CONFIG_GENERIC_ISA_DMA=y > CONFIG_GENERIC_BUG=y > CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y > CONFIG_GENERIC_HWEIGHT=y > CONFIG_ARCH_MAY_HAVE_PC_FDC=y > CONFIG_RWSEM_XCHGADD_ALGORITHM=y > CONFIG_GENERIC_CALIBRATE_DELAY=y > CONFIG_ARCH_HAS_CPU_RELAX=y > CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y > CONFIG_HAVE_SETUP_PER_CPU_AREA=y > CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y > CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y > CONFIG_ARCH_HIBERNATION_POSSIBLE=y > CONFIG_ARCH_SUSPEND_POSSIBLE=y > CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y > CONFIG_ARCH_WANT_GENERAL_HUGETLB=y > CONFIG_ZONE_DMA32=y > CONFIG_AUDIT_ARCH=y > CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y > CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y > CONFIG_HAVE_INTEL_TXT=y > CONFIG_X86_64_SMP=y > CONFIG_X86_HT=y > CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi > -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 > -fcall-saved-r10 -fcall-saved-r11" > CONFIG_ARCH_SUPPORTS_UPROBES=y > CONFIG_FIX_EARLYCON_MEM=y > CONFIG_PGTABLE_LEVELS=4 > CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" > CONFIG_IRQ_WORK=y > CONFIG_BUILDTIME_EXTABLE_SORT=y > > # > # General setup > # > CONFIG_INIT_ENV_ARG_LIMIT=32 > CONFIG_CROSS_COMPILE="" > # CONFIG_COMPILE_TEST is not set > CONFIG_LOCALVERSION="" > # CONFIG_LOCALVERSION_AUTO is not set > CONFIG_HAVE_KERNEL_GZIP=y > CONFIG_HAVE_KERNEL_BZIP2=y > CONFIG_HAVE_KERNEL_LZMA=y > CONFIG_HAVE_KERNEL_XZ=y > CONFIG_HAVE_KERNEL_LZO=y > CONFIG_HAVE_KERNEL_LZ4=y > CONFIG_KERNEL_GZIP=y > # CONFIG_KERNEL_BZIP2 is not set > # CONFIG_KERNEL_LZMA is not set > # CONFIG_KERNEL_XZ is not set > # CONFIG_KERNEL_LZO is not set > # CONFIG_KERNEL_LZ4 is not set > CONFIG_DEFAULT_HOSTNAME="(none)" > CONFIG_SWAP=y > CONFIG_SYSVIPC=y > CONFIG_SYSVIPC_SYSCTL=y > CONFIG_POSIX_MQUEUE=y > CONFIG_POSIX_MQUEUE_SYSCTL=y > CONFIG_CROSS_MEMORY_ATTACH=y > CONFIG_FHANDLE=y > CONFIG_USELIB=y > CONFIG_AUDIT=y > CONFIG_HAVE_ARCH_AUDITSYSCALL=y > CONFIG_AUDITSYSCALL=y > CONFIG_AUDIT_WATCH=y > CONFIG_AUDIT_TREE=y > > # > # IRQ subsystem > # > CONFIG_GENERIC_IRQ_PROBE=y > CONFIG_GENERIC_IRQ_SHOW=y > CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y > CONFIG_GENERIC_PENDING_IRQ=y > CONFIG_IRQ_DOMAIN=y > CONFIG_GENERIC_MSI_IRQ=y > # CONFIG_IRQ_DOMAIN_DEBUG is not set > CONFIG_IRQ_FORCED_THREADING=y > CONFIG_SPARSE_IRQ=y > CONFIG_C