Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()

2014-04-06 Thread Jiri Olsa
On Sun, Mar 30, 2014 at 08:41:18PM +0800, Fengguang Wu wrote:
> This fix is not yet in linux-next, can anyone help merge it? Thanks!

sorry for late reply, I was out last week,
I'll resend this with proper changelog

jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()

2014-03-30 Thread Fengguang Wu
This fix is not yet in linux-next, can anyone help merge it? Thanks!

On Tue, Mar 11, 2014 at 08:14:56PM +0800, Fengguang Wu wrote:
> Jiri,
> 
> It works, thank you!
> 
> Tested-by: Fengguang Wu 
> 
> > ---
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 661951a..a53857e 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -5423,6 +5423,8 @@ struct swevent_htable {
> >  
> > /* Recursion avoidance in each contexts */
> > int recursion[PERF_NR_CONTEXTS];
> > +
> > +   booloffline;
> >  };
> >  
> >  static DEFINE_PER_CPU(struct swevent_htable, swevent_htable);
> > @@ -5669,8 +5671,10 @@ static int perf_swevent_add(struct perf_event 
> > *event, int flags)
> > hwc->state = !(flags & PERF_EF_START);
> >  
> > head = find_swevent_head(swhash, event);
> > -   if (WARN_ON_ONCE(!head))
> > +   if (!head) {
> > +   WARN_ON_ONCE(!swhash->offline);
> > return -EINVAL;
> > +   }
> >  
> > hlist_add_head_rcu(&event->hlist_entry, head);
> >  
> > @@ -7850,6 +7854,7 @@ static void perf_event_init_cpu(int cpu)
> > struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
> >  
> > mutex_lock(&swhash->hlist_mutex);
> > +   swhash->offline = false;
> > if (swhash->hlist_refcount > 0) {
> > struct swevent_hlist *hlist;
> >  
> > @@ -7907,6 +7912,7 @@ static void perf_event_exit_cpu(int cpu)
> > perf_event_exit_cpu_context(cpu);
> >  
> > mutex_lock(&swhash->hlist_mutex);
> > +   swhash->offline = true;
> > swevent_hlist_release(swhash);
> > mutex_unlock(&swhash->hlist_mutex);
> >  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()

2014-03-11 Thread Fengguang Wu
Jiri,

It works, thank you!

Tested-by: Fengguang Wu 

> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 661951a..a53857e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5423,6 +5423,8 @@ struct swevent_htable {
>  
>   /* Recursion avoidance in each contexts */
>   int recursion[PERF_NR_CONTEXTS];
> +
> + booloffline;
>  };
>  
>  static DEFINE_PER_CPU(struct swevent_htable, swevent_htable);
> @@ -5669,8 +5671,10 @@ static int perf_swevent_add(struct perf_event *event, 
> int flags)
>   hwc->state = !(flags & PERF_EF_START);
>  
>   head = find_swevent_head(swhash, event);
> - if (WARN_ON_ONCE(!head))
> + if (!head) {
> + WARN_ON_ONCE(!swhash->offline);
>   return -EINVAL;
> + }
>  
>   hlist_add_head_rcu(&event->hlist_entry, head);
>  
> @@ -7850,6 +7854,7 @@ static void perf_event_init_cpu(int cpu)
>   struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
>  
>   mutex_lock(&swhash->hlist_mutex);
> + swhash->offline = false;
>   if (swhash->hlist_refcount > 0) {
>   struct swevent_hlist *hlist;
>  
> @@ -7907,6 +7912,7 @@ static void perf_event_exit_cpu(int cpu)
>   perf_event_exit_cpu_context(cpu);
>  
>   mutex_lock(&swhash->hlist_mutex);
> + swhash->offline = true;
>   swevent_hlist_release(swhash);
>   mutex_unlock(&swhash->hlist_mutex);
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()

2014-03-10 Thread Fengguang Wu
Hi Jiri,

On Mon, Mar 10, 2014 at 01:53:19PM +0100, Jiri Olsa wrote:
> On Sat, Mar 08, 2014 at 02:51:53PM +0800, Fengguang Wu wrote:
> > 
> > Hi all,
> > 
> > This is a very old WARNING, too old to be bisectable. The below 3 different
> > back traces show that it's always triggered by trinity at system reboot 
> > time.
> > Any ideas to quiet it? Thank you!
> 
> hi,
> is there cpu hotplug involved? like writing to:
>   /sys/devices/system/cpu/cpu*/online

Yeah, we do run random CPU hotplug tests in the background.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()

2014-03-10 Thread Jiri Olsa
On Mon, Mar 10, 2014 at 11:40:23PM +0100, Jiri Olsa wrote:
> On Mon, Mar 10, 2014 at 01:53:19PM +0100, Jiri Olsa wrote:
> > On Sat, Mar 08, 2014 at 02:51:53PM +0800, Fengguang Wu wrote:
> > > 
> > > Hi all,
> > > 
> > > This is a very old WARNING, too old to be bisectable. The below 3 
> > > different
> > > back traces show that it's always triggered by trinity at system reboot 
> > > time.
> > > Any ideas to quiet it? Thank you!
> > 
> > hi,
> > is there cpu hotplug involved? like writing to:
> >   /sys/devices/system/cpu/cpu*/online
> > 
> 
> I think there's race with hotplug code,
> I can reproduce this with:
> 
>   $ ./perf record -e faults ./perf bench sched pipe
> 
> and put one of the cpus offline:
> 
>   [root@krava cpu]# pwd
>   /sys/devices/system/cpu
>   [root@krava cpu]# echo 0 > cpu1/online 

the perf cpu offline callback takes down all cpu context events
and release swhash->swevent_hlist

this could race with task context software events being
just scheduled in on this cpu via perf_swevent_add
(note only cpu ctx events are terminated in the hotplug code)

the race happens in the gap between the cpu notifier code and the
cpu being actually taken down (and become un-sched-able)

I wonder what should we do:

- terminate task ctx events on hotplug-ed cpu (same as for cpu ctx)
  this seems too much..

- schedule out task ctx events on hotplug-ed cpu
  we might race again with another events sched in (during the race gap)
  (if this could be prevented, this would be the best option i think)

- dont release that 'struct swevent_hlist' at all.. it's about 2KB size per cpu

- remove the warning ;-) or make it omit the hotplug-ed cpu case, so
  we dont loose potentional bug warning, please check attached patch

thoughts?
jirka


---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 661951a..a53857e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5423,6 +5423,8 @@ struct swevent_htable {
 
/* Recursion avoidance in each contexts */
int recursion[PERF_NR_CONTEXTS];
+
+   booloffline;
 };
 
 static DEFINE_PER_CPU(struct swevent_htable, swevent_htable);
@@ -5669,8 +5671,10 @@ static int perf_swevent_add(struct perf_event *event, 
int flags)
hwc->state = !(flags & PERF_EF_START);
 
head = find_swevent_head(swhash, event);
-   if (WARN_ON_ONCE(!head))
+   if (!head) {
+   WARN_ON_ONCE(!swhash->offline);
return -EINVAL;
+   }
 
hlist_add_head_rcu(&event->hlist_entry, head);
 
@@ -7850,6 +7854,7 @@ static void perf_event_init_cpu(int cpu)
struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
 
mutex_lock(&swhash->hlist_mutex);
+   swhash->offline = false;
if (swhash->hlist_refcount > 0) {
struct swevent_hlist *hlist;
 
@@ -7907,6 +7912,7 @@ static void perf_event_exit_cpu(int cpu)
perf_event_exit_cpu_context(cpu);
 
mutex_lock(&swhash->hlist_mutex);
+   swhash->offline = true;
swevent_hlist_release(swhash);
mutex_unlock(&swhash->hlist_mutex);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()

2014-03-10 Thread Jiri Olsa
On Mon, Mar 10, 2014 at 01:53:19PM +0100, Jiri Olsa wrote:
> On Sat, Mar 08, 2014 at 02:51:53PM +0800, Fengguang Wu wrote:
> > 
> > Hi all,
> > 
> > This is a very old WARNING, too old to be bisectable. The below 3 different
> > back traces show that it's always triggered by trinity at system reboot 
> > time.
> > Any ideas to quiet it? Thank you!
> 
> hi,
> is there cpu hotplug involved? like writing to:
>   /sys/devices/system/cpu/cpu*/online
> 

I think there's race with hotplug code,
I can reproduce this with:

  $ ./perf record -e faults ./perf bench sched pipe

and put one of the cpus offline:

  [root@krava cpu]# pwd
  /sys/devices/system/cpu
  [root@krava cpu]# echo 0 > cpu1/online 

working on fix ;-)

thanks,
jirka

---
[  133.726229] [ cut here ]
[  133.726236] WARNING: CPU: 1 PID: 1194 at kernel/events/core.c:5640 
perf_swevent_add+0x112/0x120()
[  133.726237] Modules linked in: ip6table_filter ip6_tables ebtable_nat 
ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM 
iptable_mangle tun bridge stp llc ccm nf_conntrack_ipv4 nf_defrag_ipv4 
xt_conntrack nf_conntrack snd_hda_codec_hdmi snd_hda_codec_conexant uvcvideo 
x86_pkg_temp_thermal coretemp snd_hda_intel kvm_intel arc4 iwldvm 
videobuf2_vmalloc kvm snd_hda_codec videobuf2_memops snd_hwdep videobuf2_core 
mac80211 videodev sdhci_pci iTCO_wdt iTCO_vendor_support sdhci lpc_ich 
microcode iwlwifi media snd_seq snd_seq_device snd_pcm i2c_i801 serio_raw 
cfg80211 mfd_core mmc_core btusb mei_me shpchp snd_page_alloc bluetooth 
snd_timer nfsd e1000e mei ptp auth_rpcgss pps_core thinkpad_acpi nfs_acl lockd 
snd soundcore rfkill sunrpc wmi binfmt_misc dm_crypt i915 crct10dif_pclmul 
crc32_pclmul
[  133.726266]  i2c_algo_bit drm_kms_helper crc32c_intel drm i2c_core 
ghash_clmulni_intel video
[  133.726270] CPU: 1 PID: 1194 Comm: sched-pipe Not tainted 
3.13.5-103.fc19.x86_64 #1
[  133.726271] Hardware name: LENOVO 4291EJ3/4291EJ3, BIOS 8DET56WW (1.26 ) 
12/01/2011
[  133.726272]  0009 8802091c5b18 81680604 

[  133.726274]  8802091c5b50 8106d35d 880209084c00 
8800d015ec00
[  133.726276]  88021e257748 88021e25774c 227a1324 
8802091c5b60
[  133.726277] Call Trace:
[  133.726282]  [] dump_stack+0x45/0x56
[  133.726285]  [] warn_slowpath_common+0x7d/0xa0
[  133.726287]  [] warn_slowpath_null+0x1a/0x20
[  133.726289]  [] perf_swevent_add+0x112/0x120
[  133.726291]  [] event_sched_in.isra.78+0x90/0x1d0
[  133.726293]  [] group_sched_in+0x6a/0x1e0
[  133.726296]  [] ? native_sched_clock+0x13/0x80
[  133.726297]  [] ? sched_clock+0x9/0x10
[  133.726299]  [] ctx_sched_in+0x10e/0x1d0
[  133.726300]  [] perf_event_sched_in+0x60/0x90
[  133.726302]  [] perf_event_context_sched_in+0x78/0xc0
[  133.726303]  [] __perf_event_task_sched_in+0x182/0x1a0
[  133.726306]  [] finish_task_switch+0xa8/0xf0
[  133.726308]  [] __schedule+0x2e2/0x740
[  133.726310]  [] schedule+0x29/0x70
[  133.726313]  [] pipe_wait+0x61/0xa0
[  133.726315]  [] ? abort_exclusive_wait+0xb0/0xb0
[  133.726316]  [] pipe_read+0x2fd/0x4f0
[  133.726319]  [] do_sync_read+0x5a/0x90
[  133.726321]  [] vfs_read+0x95/0x160
[  133.726322]  [] SyS_read+0x49/0xa0
[  133.726325]  [] system_call_fastpath+0x16/0x1b
[  133.726326] ---[ end trace 75f3a06e52d51e52 ]---
[  133.731360] kvm: disabling virtualization on CPU1
[  133.731367] smpboot: CPU 1 is now offline

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [reboot] WARNING: CPU: 0 PID: 112 at kernel/events/core.c:5655 perf_swevent_add()

2014-03-10 Thread Jiri Olsa
On Sat, Mar 08, 2014 at 02:51:53PM +0800, Fengguang Wu wrote:
> 
> Hi all,
> 
> This is a very old WARNING, too old to be bisectable. The below 3 different
> back traces show that it's always triggered by trinity at system reboot time.
> Any ideas to quiet it? Thank you!

hi,
is there cpu hotplug involved? like writing to:
  /sys/devices/system/cpu/cpu*/online

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/