Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()
On Sun, Mar 30, 2014 at 08:41:18PM +0800, Fengguang Wu wrote: > This fix is not yet in linux-next, can anyone help merge it? Thanks! sorry for late reply, I was out last week, I'll resend this with proper changelog jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()
This fix is not yet in linux-next, can anyone help merge it? Thanks! On Tue, Mar 11, 2014 at 08:14:56PM +0800, Fengguang Wu wrote: > Jiri, > > It works, thank you! > > Tested-by: Fengguang Wu > > > --- > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index 661951a..a53857e 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -5423,6 +5423,8 @@ struct swevent_htable { > > > > /* Recursion avoidance in each contexts */ > > int recursion[PERF_NR_CONTEXTS]; > > + > > + booloffline; > > }; > > > > static DEFINE_PER_CPU(struct swevent_htable, swevent_htable); > > @@ -5669,8 +5671,10 @@ static int perf_swevent_add(struct perf_event > > *event, int flags) > > hwc->state = !(flags & PERF_EF_START); > > > > head = find_swevent_head(swhash, event); > > - if (WARN_ON_ONCE(!head)) > > + if (!head) { > > + WARN_ON_ONCE(!swhash->offline); > > return -EINVAL; > > + } > > > > hlist_add_head_rcu(&event->hlist_entry, head); > > > > @@ -7850,6 +7854,7 @@ static void perf_event_init_cpu(int cpu) > > struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu); > > > > mutex_lock(&swhash->hlist_mutex); > > + swhash->offline = false; > > if (swhash->hlist_refcount > 0) { > > struct swevent_hlist *hlist; > > > > @@ -7907,6 +7912,7 @@ static void perf_event_exit_cpu(int cpu) > > perf_event_exit_cpu_context(cpu); > > > > mutex_lock(&swhash->hlist_mutex); > > + swhash->offline = true; > > swevent_hlist_release(swhash); > > mutex_unlock(&swhash->hlist_mutex); > > } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()
Jiri, It works, thank you! Tested-by: Fengguang Wu > --- > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 661951a..a53857e 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -5423,6 +5423,8 @@ struct swevent_htable { > > /* Recursion avoidance in each contexts */ > int recursion[PERF_NR_CONTEXTS]; > + > + booloffline; > }; > > static DEFINE_PER_CPU(struct swevent_htable, swevent_htable); > @@ -5669,8 +5671,10 @@ static int perf_swevent_add(struct perf_event *event, > int flags) > hwc->state = !(flags & PERF_EF_START); > > head = find_swevent_head(swhash, event); > - if (WARN_ON_ONCE(!head)) > + if (!head) { > + WARN_ON_ONCE(!swhash->offline); > return -EINVAL; > + } > > hlist_add_head_rcu(&event->hlist_entry, head); > > @@ -7850,6 +7854,7 @@ static void perf_event_init_cpu(int cpu) > struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu); > > mutex_lock(&swhash->hlist_mutex); > + swhash->offline = false; > if (swhash->hlist_refcount > 0) { > struct swevent_hlist *hlist; > > @@ -7907,6 +7912,7 @@ static void perf_event_exit_cpu(int cpu) > perf_event_exit_cpu_context(cpu); > > mutex_lock(&swhash->hlist_mutex); > + swhash->offline = true; > swevent_hlist_release(swhash); > mutex_unlock(&swhash->hlist_mutex); > } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()
Hi Jiri, On Mon, Mar 10, 2014 at 01:53:19PM +0100, Jiri Olsa wrote: > On Sat, Mar 08, 2014 at 02:51:53PM +0800, Fengguang Wu wrote: > > > > Hi all, > > > > This is a very old WARNING, too old to be bisectable. The below 3 different > > back traces show that it's always triggered by trinity at system reboot > > time. > > Any ideas to quiet it? Thank you! > > hi, > is there cpu hotplug involved? like writing to: > /sys/devices/system/cpu/cpu*/online Yeah, we do run random CPU hotplug tests in the background. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()
On Mon, Mar 10, 2014 at 11:40:23PM +0100, Jiri Olsa wrote: > On Mon, Mar 10, 2014 at 01:53:19PM +0100, Jiri Olsa wrote: > > On Sat, Mar 08, 2014 at 02:51:53PM +0800, Fengguang Wu wrote: > > > > > > Hi all, > > > > > > This is a very old WARNING, too old to be bisectable. The below 3 > > > different > > > back traces show that it's always triggered by trinity at system reboot > > > time. > > > Any ideas to quiet it? Thank you! > > > > hi, > > is there cpu hotplug involved? like writing to: > > /sys/devices/system/cpu/cpu*/online > > > > I think there's race with hotplug code, > I can reproduce this with: > > $ ./perf record -e faults ./perf bench sched pipe > > and put one of the cpus offline: > > [root@krava cpu]# pwd > /sys/devices/system/cpu > [root@krava cpu]# echo 0 > cpu1/online the perf cpu offline callback takes down all cpu context events and release swhash->swevent_hlist this could race with task context software events being just scheduled in on this cpu via perf_swevent_add (note only cpu ctx events are terminated in the hotplug code) the race happens in the gap between the cpu notifier code and the cpu being actually taken down (and become un-sched-able) I wonder what should we do: - terminate task ctx events on hotplug-ed cpu (same as for cpu ctx) this seems too much.. - schedule out task ctx events on hotplug-ed cpu we might race again with another events sched in (during the race gap) (if this could be prevented, this would be the best option i think) - dont release that 'struct swevent_hlist' at all.. it's about 2KB size per cpu - remove the warning ;-) or make it omit the hotplug-ed cpu case, so we dont loose potentional bug warning, please check attached patch thoughts? jirka --- diff --git a/kernel/events/core.c b/kernel/events/core.c index 661951a..a53857e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5423,6 +5423,8 @@ struct swevent_htable { /* Recursion avoidance in each contexts */ int recursion[PERF_NR_CONTEXTS]; + + booloffline; }; static DEFINE_PER_CPU(struct swevent_htable, swevent_htable); @@ -5669,8 +5671,10 @@ static int perf_swevent_add(struct perf_event *event, int flags) hwc->state = !(flags & PERF_EF_START); head = find_swevent_head(swhash, event); - if (WARN_ON_ONCE(!head)) + if (!head) { + WARN_ON_ONCE(!swhash->offline); return -EINVAL; + } hlist_add_head_rcu(&event->hlist_entry, head); @@ -7850,6 +7854,7 @@ static void perf_event_init_cpu(int cpu) struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu); mutex_lock(&swhash->hlist_mutex); + swhash->offline = false; if (swhash->hlist_refcount > 0) { struct swevent_hlist *hlist; @@ -7907,6 +7912,7 @@ static void perf_event_exit_cpu(int cpu) perf_event_exit_cpu_context(cpu); mutex_lock(&swhash->hlist_mutex); + swhash->offline = true; swevent_hlist_release(swhash); mutex_unlock(&swhash->hlist_mutex); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()
On Mon, Mar 10, 2014 at 01:53:19PM +0100, Jiri Olsa wrote: > On Sat, Mar 08, 2014 at 02:51:53PM +0800, Fengguang Wu wrote: > > > > Hi all, > > > > This is a very old WARNING, too old to be bisectable. The below 3 different > > back traces show that it's always triggered by trinity at system reboot > > time. > > Any ideas to quiet it? Thank you! > > hi, > is there cpu hotplug involved? like writing to: > /sys/devices/system/cpu/cpu*/online > I think there's race with hotplug code, I can reproduce this with: $ ./perf record -e faults ./perf bench sched pipe and put one of the cpus offline: [root@krava cpu]# pwd /sys/devices/system/cpu [root@krava cpu]# echo 0 > cpu1/online working on fix ;-) thanks, jirka --- [ 133.726229] [ cut here ] [ 133.726236] WARNING: CPU: 1 PID: 1194 at kernel/events/core.c:5640 perf_swevent_add+0x112/0x120() [ 133.726237] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle tun bridge stp llc ccm nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack snd_hda_codec_hdmi snd_hda_codec_conexant uvcvideo x86_pkg_temp_thermal coretemp snd_hda_intel kvm_intel arc4 iwldvm videobuf2_vmalloc kvm snd_hda_codec videobuf2_memops snd_hwdep videobuf2_core mac80211 videodev sdhci_pci iTCO_wdt iTCO_vendor_support sdhci lpc_ich microcode iwlwifi media snd_seq snd_seq_device snd_pcm i2c_i801 serio_raw cfg80211 mfd_core mmc_core btusb mei_me shpchp snd_page_alloc bluetooth snd_timer nfsd e1000e mei ptp auth_rpcgss pps_core thinkpad_acpi nfs_acl lockd snd soundcore rfkill sunrpc wmi binfmt_misc dm_crypt i915 crct10dif_pclmul crc32_pclmul [ 133.726266] i2c_algo_bit drm_kms_helper crc32c_intel drm i2c_core ghash_clmulni_intel video [ 133.726270] CPU: 1 PID: 1194 Comm: sched-pipe Not tainted 3.13.5-103.fc19.x86_64 #1 [ 133.726271] Hardware name: LENOVO 4291EJ3/4291EJ3, BIOS 8DET56WW (1.26 ) 12/01/2011 [ 133.726272] 0009 8802091c5b18 81680604 [ 133.726274] 8802091c5b50 8106d35d 880209084c00 8800d015ec00 [ 133.726276] 88021e257748 88021e25774c 227a1324 8802091c5b60 [ 133.726277] Call Trace: [ 133.726282] [] dump_stack+0x45/0x56 [ 133.726285] [] warn_slowpath_common+0x7d/0xa0 [ 133.726287] [] warn_slowpath_null+0x1a/0x20 [ 133.726289] [] perf_swevent_add+0x112/0x120 [ 133.726291] [] event_sched_in.isra.78+0x90/0x1d0 [ 133.726293] [] group_sched_in+0x6a/0x1e0 [ 133.726296] [] ? native_sched_clock+0x13/0x80 [ 133.726297] [] ? sched_clock+0x9/0x10 [ 133.726299] [] ctx_sched_in+0x10e/0x1d0 [ 133.726300] [] perf_event_sched_in+0x60/0x90 [ 133.726302] [] perf_event_context_sched_in+0x78/0xc0 [ 133.726303] [] __perf_event_task_sched_in+0x182/0x1a0 [ 133.726306] [] finish_task_switch+0xa8/0xf0 [ 133.726308] [] __schedule+0x2e2/0x740 [ 133.726310] [] schedule+0x29/0x70 [ 133.726313] [] pipe_wait+0x61/0xa0 [ 133.726315] [] ? abort_exclusive_wait+0xb0/0xb0 [ 133.726316] [] pipe_read+0x2fd/0x4f0 [ 133.726319] [] do_sync_read+0x5a/0x90 [ 133.726321] [] vfs_read+0x95/0x160 [ 133.726322] [] SyS_read+0x49/0xa0 [ 133.726325] [] system_call_fastpath+0x16/0x1b [ 133.726326] ---[ end trace 75f3a06e52d51e52 ]--- [ 133.731360] kvm: disabling virtualization on CPU1 [ 133.731367] smpboot: CPU 1 is now offline -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()
On Sat, Mar 08, 2014 at 02:51:53PM +0800, Fengguang Wu wrote: > > Hi all, > > This is a very old WARNING, too old to be bisectable. The below 3 different > back traces show that it's always triggered by trinity at system reboot time. > Any ideas to quiet it? Thank you! hi, is there cpu hotplug involved? like writing to: /sys/devices/system/cpu/cpu*/online thanks, jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/