Re: perf: another perf_fuzzer generated lockup

2015-05-27 Thread Vince Weaver
On Wed, 27 May 2015, Peter Zijlstra wrote:

> On Fri, May 08, 2015 at 12:40:48AM -0400, Vince Weaver wrote:
> > 
> > This is a new one I think, I hit it on the haswell machine running 
> > 4.1-rc2.
> > 
> 
> Vince, should I still be running fast_repro99.sh ? Or is there another
> preferred way to start the beast?

yes, that's still probably the best way to start it.

I've recently been overhauling the fuzzer including a massive re-work of 
the mmap() code (to try to exercise the AUX codepath) so if it starts 
segfaulting on you that's probably why.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf: another perf_fuzzer generated lockup

2015-05-27 Thread Peter Zijlstra
On Fri, May 08, 2015 at 12:40:48AM -0400, Vince Weaver wrote:
> 
> This is a new one I think, I hit it on the haswell machine running 
> 4.1-rc2.
> 

Vince, should I still be running fast_repro99.sh ? Or is there another
preferred way to start the beast?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf: another perf_fuzzer generated lockup

2015-05-27 Thread Peter Zijlstra
On Fri, May 08, 2015 at 12:40:48AM -0400, Vince Weaver wrote:
 
 This is a new one I think, I hit it on the haswell machine running 
 4.1-rc2.
 

Vince, should I still be running fast_repro99.sh ? Or is there another
preferred way to start the beast?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf: another perf_fuzzer generated lockup

2015-05-27 Thread Vince Weaver
On Wed, 27 May 2015, Peter Zijlstra wrote:

 On Fri, May 08, 2015 at 12:40:48AM -0400, Vince Weaver wrote:
  
  This is a new one I think, I hit it on the haswell machine running 
  4.1-rc2.
  
 
 Vince, should I still be running fast_repro99.sh ? Or is there another
 preferred way to start the beast?

yes, that's still probably the best way to start it.

I've recently been overhauling the fuzzer including a massive re-work of 
the mmap() code (to try to exercise the AUX codepath) so if it starts 
segfaulting on you that's probably why.

Vince
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf: another perf_fuzzer generated lockup

2015-05-11 Thread Vince Weaver
On Fri, 8 May 2015, Stephane Eranian wrote:

> Vince,
> 
> On Thu, May 7, 2015 at 9:40 PM, Vince Weaver  wrote:
> >
> >
> > This is a new one I think, I hit it on the haswell machine running
> > 4.1-rc2.
> >
> > The backtrace is complex enough I'm not really sure what's going on here.
> >
> > The fuzzer has been having weird issues where it's been getting
> > overflow signals from invalid fds.  This seems to happen
> > when an overflow signal interrupts the fuzzer mid-fork?
> > And the fuzzer code doesn't handle this well and attempts to call exit()
> > and/or kill the child from the signal handler that interrupted the
> > fork() and that doesn't always go well.  I'm not sure if this is related,
> > just that some of those actions seem to appear in the backtrace.
> >
> >
> Is there a way to figure out how the fuzzer had programmed the PMU
> to get there? (besides adding PMU state dump in the kernel crashdump)?

Not easily.  In theory the fuzzer can regenerate state from the random 
seed, but some of these bugs seem to be timing related or race conditions, 
so they don't always replicate.  

Also I can make the fuzzer dump the state, but often it has 100+ events 
active and no way of knowing which ones are currently scheduled onto the 
CPU.

Dumping the PMU state might help, but at the same time there's all the 
other events going on such as software and tracepoint events and they 
might all be contributing.

This particular bug almost replicates; the system definitely pauses for a 
bit even if not long enough to trigger the watchdog.

I've been meaning to work on it more but we just finished with finals so I'm 
stuck doing more or less nothing but grading this next week.  After that I 
should have some time to work on this issue plus a couple other warnings 
the fuzzer has been showing.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf: another perf_fuzzer generated lockup

2015-05-11 Thread Vince Weaver
On Fri, 8 May 2015, Stephane Eranian wrote:

 Vince,
 
 On Thu, May 7, 2015 at 9:40 PM, Vince Weaver vincent.wea...@maine.edu wrote:
 
 
  This is a new one I think, I hit it on the haswell machine running
  4.1-rc2.
 
  The backtrace is complex enough I'm not really sure what's going on here.
 
  The fuzzer has been having weird issues where it's been getting
  overflow signals from invalid fds.  This seems to happen
  when an overflow signal interrupts the fuzzer mid-fork?
  And the fuzzer code doesn't handle this well and attempts to call exit()
  and/or kill the child from the signal handler that interrupted the
  fork() and that doesn't always go well.  I'm not sure if this is related,
  just that some of those actions seem to appear in the backtrace.
 
 
 Is there a way to figure out how the fuzzer had programmed the PMU
 to get there? (besides adding PMU state dump in the kernel crashdump)?

Not easily.  In theory the fuzzer can regenerate state from the random 
seed, but some of these bugs seem to be timing related or race conditions, 
so they don't always replicate.  

Also I can make the fuzzer dump the state, but often it has 100+ events 
active and no way of knowing which ones are currently scheduled onto the 
CPU.

Dumping the PMU state might help, but at the same time there's all the 
other events going on such as software and tracepoint events and they 
might all be contributing.

This particular bug almost replicates; the system definitely pauses for a 
bit even if not long enough to trigger the watchdog.

I've been meaning to work on it more but we just finished with finals so I'm 
stuck doing more or less nothing but grading this next week.  After that I 
should have some time to work on this issue plus a couple other warnings 
the fuzzer has been showing.

Vince
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf: another perf_fuzzer generated lockup

2015-05-08 Thread Stephane Eranian
Vince,

On Thu, May 7, 2015 at 9:40 PM, Vince Weaver  wrote:
>
>
> This is a new one I think, I hit it on the haswell machine running
> 4.1-rc2.
>
> The backtrace is complex enough I'm not really sure what's going on here.
>
> The fuzzer has been having weird issues where it's been getting
> overflow signals from invalid fds.  This seems to happen
> when an overflow signal interrupts the fuzzer mid-fork?
> And the fuzzer code doesn't handle this well and attempts to call exit()
> and/or kill the child from the signal handler that interrupted the
> fork() and that doesn't always go well.  I'm not sure if this is related,
> just that some of those actions seem to appear in the backtrace.
>
>
Is there a way to figure out how the fuzzer had programmed the PMU
to get there? (besides adding PMU state dump in the kernel crashdump)?


>
> [33864.529861] [ cut here ]
> [33864.534824] WARNING: CPU: 1 PID: 9852 at kernel/watchdog.c:302 
> watchdog_overflow_callback+0x92/0xc0()
> [33864.544682] Watchdog detected hard LOCKUP on cpu 1
> [33864.549635] Modules linked in:
> [33864.552943]  fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl 
> iosf_mbi coretemp snd_hda_codec_realtek snd_hda_codec_hdmi 
> snd_hda_codec_generic kvm snd_hda_intel snd_hda_controller snd_hda_codec 
> snd_hda_core crct10dif_pclmul snd_hwdep crc32_pclmul ghash_clmulni_intel 
> snd_pcm aesni_intel aes_x86_64 lrw gf128mul evdev i915 iTCO_wdt 
> iTCO_vendor_support glue_helper snd_timer ppdev ablk_helper psmouse 
> drm_kms_helper cryptd snd drm pcspkr serio_raw lpc_ich soundcore parport_pc 
> xhci_pci battery video processor i2c_i801 mei_me mei wmi i2c_algo_bit tpm_tis 
> mfd_core xhci_hcd tpm parport button sg sr_mod sd_mod cdrom ehci_pci ahci 
> ehci_hcd libahci libata e1000e ptp usbcore crc32c_intel scsi_mod fan 
> usb_common pps_core thermal thermal_sys
> [33864.622413] CPU: 1 PID: 9852 Comm: perf_fuzzer Not tainted 4.1.0-rc2+ #142
> [33864.629776] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 
> 01/26/2014
> [33864.637685]  81a209b5 88011ea45aa0 816d51d3 
> 
> [33864.645709]  88011ea45af0 88011ea45ae0 81072dfa 
> 88011ea45ac0
> [33864.653731]  880119b8f800  88011ea45c40 
> 88011ea45ef8
> [33864.661783] Call Trace:
> [33864.664409][] dump_stack+0x45/0x57
> [33864.670618]  [] warn_slowpath_common+0x8a/0xc0
> [33864.677071]  [] warn_slowpath_fmt+0x46/0x50
> [33864.683202]  [] ? native_apic_wait_icr_idle+0x24/0x30
> [33864.690280]  [] watchdog_overflow_callback+0x92/0xc0
> [33864.697294]  [] __perf_event_overflow+0x91/0x270
> [33864.703916]  [] ? __perf_event_overflow+0xd9/0x270
> [33864.710696]  [] ? x86_perf_event_set_period+0xda/0x180
> [33864.717842]  [] perf_event_overflow+0x19/0x20
> [33864.724195]  [] intel_pmu_handle_irq+0x1e2/0x450
> [33864.730840]  [] perf_event_nmi_handler+0x2b/0x50
> [33864.737436]  [] nmi_handle+0xa0/0x150
> [33864.743025]  [] ? nmi_handle+0x5/0x150
> [33864.748733]  [] default_do_nmi+0x4a/0x140
> [33864.754705]  [] do_nmi+0x98/0xe0
> [33864.759858]  [] end_repeat_nmi+0x1e/0x2e
> [33864.765746]  [] ? check_chain_key+0xdb/0x1e0
> [33864.772004]  [] ? check_chain_key+0xdb/0x1e0
> [33864.778253]  [] ? check_chain_key+0xdb/0x1e0
> [33864.784498]  <>[] 
> __lock_acquire.isra.31+0x3b9/0x1000
> [33864.792950]  [] ? __lock_acquire.isra.31+0x3b9/0x1000
> [33864.800045]  [] lock_acquire+0xa5/0x130
> [33864.805817]  [] ? __lock_task_sighand+0x6e/0x110
> [33864.812468]  [] ? __lock_task_sighand+0x1a/0x110
> [33864.819084]  [] _raw_spin_lock+0x31/0x40
> [33864.824979]  [] ? __lock_task_sighand+0x6e/0x110
> [33864.831623]  [] __lock_task_sighand+0x6e/0x110
> [33864.838096]  [] ? __lock_task_sighand+0x1a/0x110
> [33864.845314]  [] do_send_sig_info+0x2c/0x80
> [33864.851949]  [] ? perf_swevent_event+0x67/0x90
> [33864.858980]  [] send_sigio_to_task+0x12f/0x1a0
> [33864.866005]  [] ? send_sigio_to_task+0x5/0x1a0
> [33864.873047]  [] ? send_sigio+0x56/0x100
> [33864.879411]  [] send_sigio+0xae/0x100
> [33864.885564]  [] kill_fasync+0x97/0xf0
> [33864.891713]  [] ? kill_fasync+0xf/0xf0
> [33864.897983]  [] perf_event_wakeup+0xd4/0xf0
> [33864.904662]  [] ? perf_event_wakeup+0x5/0xf0
> [33864.911490]  [] ? perf_pending_event+0xe0/0x110
> [33864.918580]  [] perf_pending_event+0xe0/0x110
> [33864.925494]  [] irq_work_run_list+0x4c/0x80
> [33864.932197]  [] irq_work_run+0x18/0x40
> [33864.938469]  [] smp_trace_irq_work_interrupt+0x3f/0xc0
> [33864.946263]  [] trace_irq_work_interrupt+0x6e/0x80
> [33864.953646][] ? copy_page_range+0x527/0x9a0
> [33864.961287]  [] ? copy_page_range+0x502/0x9a0
> [33864.968265]  [] copy_process.part.23+0xc92/0x1b80
> [33864.975589]  [] ? SYSC_kill+0x8e/0x230
> [33864.981879]  [] do_fork+0xd8/0x420
> [33864.987807]  [] ? f_setown+0x83/0xa0
> [33864.993953]  [] ? SyS_fcntl+0x310/0x650
> [33865.000348]  [] ? lockdep_sys_exit_thunk+0x12/0x14
> [33865.007781]  [] SyS_clone+0x16/0x20

Re: perf: another perf_fuzzer generated lockup

2015-05-08 Thread Stephane Eranian
Vince,

On Thu, May 7, 2015 at 9:40 PM, Vince Weaver vincent.wea...@maine.edu wrote:


 This is a new one I think, I hit it on the haswell machine running
 4.1-rc2.

 The backtrace is complex enough I'm not really sure what's going on here.

 The fuzzer has been having weird issues where it's been getting
 overflow signals from invalid fds.  This seems to happen
 when an overflow signal interrupts the fuzzer mid-fork?
 And the fuzzer code doesn't handle this well and attempts to call exit()
 and/or kill the child from the signal handler that interrupted the
 fork() and that doesn't always go well.  I'm not sure if this is related,
 just that some of those actions seem to appear in the backtrace.


Is there a way to figure out how the fuzzer had programmed the PMU
to get there? (besides adding PMU state dump in the kernel crashdump)?



 [33864.529861] [ cut here ]
 [33864.534824] WARNING: CPU: 1 PID: 9852 at kernel/watchdog.c:302 
 watchdog_overflow_callback+0x92/0xc0()
 [33864.544682] Watchdog detected hard LOCKUP on cpu 1
 [33864.549635] Modules linked in:
 [33864.552943]  fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl 
 iosf_mbi coretemp snd_hda_codec_realtek snd_hda_codec_hdmi 
 snd_hda_codec_generic kvm snd_hda_intel snd_hda_controller snd_hda_codec 
 snd_hda_core crct10dif_pclmul snd_hwdep crc32_pclmul ghash_clmulni_intel 
 snd_pcm aesni_intel aes_x86_64 lrw gf128mul evdev i915 iTCO_wdt 
 iTCO_vendor_support glue_helper snd_timer ppdev ablk_helper psmouse 
 drm_kms_helper cryptd snd drm pcspkr serio_raw lpc_ich soundcore parport_pc 
 xhci_pci battery video processor i2c_i801 mei_me mei wmi i2c_algo_bit tpm_tis 
 mfd_core xhci_hcd tpm parport button sg sr_mod sd_mod cdrom ehci_pci ahci 
 ehci_hcd libahci libata e1000e ptp usbcore crc32c_intel scsi_mod fan 
 usb_common pps_core thermal thermal_sys
 [33864.622413] CPU: 1 PID: 9852 Comm: perf_fuzzer Not tainted 4.1.0-rc2+ #142
 [33864.629776] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 
 01/26/2014
 [33864.637685]  81a209b5 88011ea45aa0 816d51d3 
 
 [33864.645709]  88011ea45af0 88011ea45ae0 81072dfa 
 88011ea45ac0
 [33864.653731]  880119b8f800  88011ea45c40 
 88011ea45ef8
 [33864.661783] Call Trace:
 [33864.664409]  NMI  [816d51d3] dump_stack+0x45/0x57
 [33864.670618]  [81072dfa] warn_slowpath_common+0x8a/0xc0
 [33864.677071]  [81072e76] warn_slowpath_fmt+0x46/0x50
 [33864.683202]  [8104db64] ? native_apic_wait_icr_idle+0x24/0x30
 [33864.690280]  [81124f22] watchdog_overflow_callback+0x92/0xc0
 [33864.697294]  [811673b1] __perf_event_overflow+0x91/0x270
 [33864.703916]  [811673f9] ? __perf_event_overflow+0xd9/0x270
 [33864.710696]  [8102b60a] ? x86_perf_event_set_period+0xda/0x180
 [33864.717842]  [81167f89] perf_event_overflow+0x19/0x20
 [33864.724195]  [810338f2] intel_pmu_handle_irq+0x1e2/0x450
 [33864.730840]  [8102a41b] perf_event_nmi_handler+0x2b/0x50
 [33864.737436]  [810185d0] nmi_handle+0xa0/0x150
 [33864.743025]  [81018535] ? nmi_handle+0x5/0x150
 [33864.748733]  [8101887a] default_do_nmi+0x4a/0x140
 [33864.754705]  [81018a08] do_nmi+0x98/0xe0
 [33864.759858]  [816df42f] end_repeat_nmi+0x1e/0x2e
 [33864.765746]  [810bc1bb] ? check_chain_key+0xdb/0x1e0
 [33864.772004]  [810bc1bb] ? check_chain_key+0xdb/0x1e0
 [33864.778253]  [810bc1bb] ? check_chain_key+0xdb/0x1e0
 [33864.784498]  EOE  IRQ  [810bdd09] 
 __lock_acquire.isra.31+0x3b9/0x1000
 [33864.792950]  [810bdd09] ? __lock_acquire.isra.31+0x3b9/0x1000
 [33864.800045]  [810bf2e5] lock_acquire+0xa5/0x130
 [33864.805817]  [81080afe] ? __lock_task_sighand+0x6e/0x110
 [33864.812468]  [81080aaa] ? __lock_task_sighand+0x1a/0x110
 [33864.819084]  [816dc721] _raw_spin_lock+0x31/0x40
 [33864.824979]  [81080afe] ? __lock_task_sighand+0x6e/0x110
 [33864.831623]  [81080afe] __lock_task_sighand+0x6e/0x110
 [33864.838096]  [81080aaa] ? __lock_task_sighand+0x1a/0x110
 [33864.845314]  [81080bcc] do_send_sig_info+0x2c/0x80
 [33864.851949]  [81167807] ? perf_swevent_event+0x67/0x90
 [33864.858980]  [8120092f] send_sigio_to_task+0x12f/0x1a0
 [33864.866005]  [81200805] ? send_sigio_to_task+0x5/0x1a0
 [33864.873047]  [812010c6] ? send_sigio+0x56/0x100
 [33864.879411]  [8120111e] send_sigio+0xae/0x100
 [33864.885564]  [81201207] kill_fasync+0x97/0xf0
 [33864.891713]  [8120117f] ? kill_fasync+0xf/0xf0
 [33864.897983]  [811667d4] perf_event_wakeup+0xd4/0xf0
 [33864.904662]  [81166705] ? perf_event_wakeup+0x5/0xf0
 [33864.911490]  [811668d0] ? perf_pending_event+0xe0/0x110
 [33864.918580]  [811668d0] perf_pending_event+0xe0/0x110
 [33864.925494]  [81156eac] 

perf: another perf_fuzzer generated lockup

2015-05-07 Thread Vince Weaver

This is a new one I think, I hit it on the haswell machine running 
4.1-rc2.

The backtrace is complex enough I'm not really sure what's going on here.

The fuzzer has been having weird issues where it's been getting 
overflow signals from invalid fds.  This seems to happen
when an overflow signal interrupts the fuzzer mid-fork?
And the fuzzer code doesn't handle this well and attempts to call exit()
and/or kill the child from the signal handler that interrupted the 
fork() and that doesn't always go well.  I'm not sure if this is related, 
just that some of those actions seem to appear in the backtrace.


[33864.529861] [ cut here ]
[33864.534824] WARNING: CPU: 1 PID: 9852 at kernel/watchdog.c:302 
watchdog_overflow_callback+0x92/0xc0()
[33864.544682] Watchdog detected hard LOCKUP on cpu 1
[33864.549635] Modules linked in:
[33864.552943]  fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi 
coretemp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic kvm 
snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core crct10dif_pclmul 
snd_hwdep crc32_pclmul ghash_clmulni_intel snd_pcm aesni_intel aes_x86_64 lrw 
gf128mul evdev i915 iTCO_wdt iTCO_vendor_support glue_helper snd_timer ppdev 
ablk_helper psmouse drm_kms_helper cryptd snd drm pcspkr serio_raw lpc_ich 
soundcore parport_pc xhci_pci battery video processor i2c_i801 mei_me mei wmi 
i2c_algo_bit tpm_tis mfd_core xhci_hcd tpm parport button sg sr_mod sd_mod 
cdrom ehci_pci ahci ehci_hcd libahci libata e1000e ptp usbcore crc32c_intel 
scsi_mod fan usb_common pps_core thermal thermal_sys
[33864.622413] CPU: 1 PID: 9852 Comm: perf_fuzzer Not tainted 4.1.0-rc2+ #142
[33864.629776] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 
01/26/2014
[33864.637685]  81a209b5 88011ea45aa0 816d51d3 

[33864.645709]  88011ea45af0 88011ea45ae0 81072dfa 
88011ea45ac0
[33864.653731]  880119b8f800  88011ea45c40 
88011ea45ef8
[33864.661783] Call Trace:
[33864.664409][] dump_stack+0x45/0x57
[33864.670618]  [] warn_slowpath_common+0x8a/0xc0
[33864.677071]  [] warn_slowpath_fmt+0x46/0x50
[33864.683202]  [] ? native_apic_wait_icr_idle+0x24/0x30
[33864.690280]  [] watchdog_overflow_callback+0x92/0xc0
[33864.697294]  [] __perf_event_overflow+0x91/0x270
[33864.703916]  [] ? __perf_event_overflow+0xd9/0x270
[33864.710696]  [] ? x86_perf_event_set_period+0xda/0x180
[33864.717842]  [] perf_event_overflow+0x19/0x20
[33864.724195]  [] intel_pmu_handle_irq+0x1e2/0x450
[33864.730840]  [] perf_event_nmi_handler+0x2b/0x50
[33864.737436]  [] nmi_handle+0xa0/0x150
[33864.743025]  [] ? nmi_handle+0x5/0x150
[33864.748733]  [] default_do_nmi+0x4a/0x140
[33864.754705]  [] do_nmi+0x98/0xe0
[33864.759858]  [] end_repeat_nmi+0x1e/0x2e
[33864.765746]  [] ? check_chain_key+0xdb/0x1e0
[33864.772004]  [] ? check_chain_key+0xdb/0x1e0
[33864.778253]  [] ? check_chain_key+0xdb/0x1e0
[33864.784498]  <>[] 
__lock_acquire.isra.31+0x3b9/0x1000
[33864.792950]  [] ? __lock_acquire.isra.31+0x3b9/0x1000
[33864.800045]  [] lock_acquire+0xa5/0x130
[33864.805817]  [] ? __lock_task_sighand+0x6e/0x110
[33864.812468]  [] ? __lock_task_sighand+0x1a/0x110
[33864.819084]  [] _raw_spin_lock+0x31/0x40
[33864.824979]  [] ? __lock_task_sighand+0x6e/0x110
[33864.831623]  [] __lock_task_sighand+0x6e/0x110
[33864.838096]  [] ? __lock_task_sighand+0x1a/0x110
[33864.845314]  [] do_send_sig_info+0x2c/0x80
[33864.851949]  [] ? perf_swevent_event+0x67/0x90
[33864.858980]  [] send_sigio_to_task+0x12f/0x1a0
[33864.866005]  [] ? send_sigio_to_task+0x5/0x1a0
[33864.873047]  [] ? send_sigio+0x56/0x100
[33864.879411]  [] send_sigio+0xae/0x100
[33864.885564]  [] kill_fasync+0x97/0xf0
[33864.891713]  [] ? kill_fasync+0xf/0xf0
[33864.897983]  [] perf_event_wakeup+0xd4/0xf0
[33864.904662]  [] ? perf_event_wakeup+0x5/0xf0
[33864.911490]  [] ? perf_pending_event+0xe0/0x110
[33864.918580]  [] perf_pending_event+0xe0/0x110
[33864.925494]  [] irq_work_run_list+0x4c/0x80
[33864.932197]  [] irq_work_run+0x18/0x40
[33864.938469]  [] smp_trace_irq_work_interrupt+0x3f/0xc0
[33864.946263]  [] trace_irq_work_interrupt+0x6e/0x80
[33864.953646][] ? copy_page_range+0x527/0x9a0
[33864.961287]  [] ? copy_page_range+0x502/0x9a0
[33864.968265]  [] copy_process.part.23+0xc92/0x1b80
[33864.975589]  [] ? SYSC_kill+0x8e/0x230
[33864.981879]  [] do_fork+0xd8/0x420
[33864.987807]  [] ? f_setown+0x83/0xa0
[33864.993953]  [] ? SyS_fcntl+0x310/0x650
[33865.000348]  [] ? lockdep_sys_exit_thunk+0x12/0x14
[33865.007781]  [] SyS_clone+0x16/0x20
[33865.013830]  [] system_call_fastpath+0x16/0x7a
[33865.020843] ---[ end trace d3bd7d73656f3cba ]---
[33865.026418] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 
496.487 msecs
[33865.035874] perf interrupt took too long (3879951 > 5000), lowering 
kernel.perf_event_max_sample_rate to 25000

--
To unsubscribe from this list: send the line "unsubscribe 

perf: another perf_fuzzer generated lockup

2015-05-07 Thread Vince Weaver

This is a new one I think, I hit it on the haswell machine running 
4.1-rc2.

The backtrace is complex enough I'm not really sure what's going on here.

The fuzzer has been having weird issues where it's been getting 
overflow signals from invalid fds.  This seems to happen
when an overflow signal interrupts the fuzzer mid-fork?
And the fuzzer code doesn't handle this well and attempts to call exit()
and/or kill the child from the signal handler that interrupted the 
fork() and that doesn't always go well.  I'm not sure if this is related, 
just that some of those actions seem to appear in the backtrace.


[33864.529861] [ cut here ]
[33864.534824] WARNING: CPU: 1 PID: 9852 at kernel/watchdog.c:302 
watchdog_overflow_callback+0x92/0xc0()
[33864.544682] Watchdog detected hard LOCKUP on cpu 1
[33864.549635] Modules linked in:
[33864.552943]  fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi 
coretemp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic kvm 
snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core crct10dif_pclmul 
snd_hwdep crc32_pclmul ghash_clmulni_intel snd_pcm aesni_intel aes_x86_64 lrw 
gf128mul evdev i915 iTCO_wdt iTCO_vendor_support glue_helper snd_timer ppdev 
ablk_helper psmouse drm_kms_helper cryptd snd drm pcspkr serio_raw lpc_ich 
soundcore parport_pc xhci_pci battery video processor i2c_i801 mei_me mei wmi 
i2c_algo_bit tpm_tis mfd_core xhci_hcd tpm parport button sg sr_mod sd_mod 
cdrom ehci_pci ahci ehci_hcd libahci libata e1000e ptp usbcore crc32c_intel 
scsi_mod fan usb_common pps_core thermal thermal_sys
[33864.622413] CPU: 1 PID: 9852 Comm: perf_fuzzer Not tainted 4.1.0-rc2+ #142
[33864.629776] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 
01/26/2014
[33864.637685]  81a209b5 88011ea45aa0 816d51d3 

[33864.645709]  88011ea45af0 88011ea45ae0 81072dfa 
88011ea45ac0
[33864.653731]  880119b8f800  88011ea45c40 
88011ea45ef8
[33864.661783] Call Trace:
[33864.664409]  NMI  [816d51d3] dump_stack+0x45/0x57
[33864.670618]  [81072dfa] warn_slowpath_common+0x8a/0xc0
[33864.677071]  [81072e76] warn_slowpath_fmt+0x46/0x50
[33864.683202]  [8104db64] ? native_apic_wait_icr_idle+0x24/0x30
[33864.690280]  [81124f22] watchdog_overflow_callback+0x92/0xc0
[33864.697294]  [811673b1] __perf_event_overflow+0x91/0x270
[33864.703916]  [811673f9] ? __perf_event_overflow+0xd9/0x270
[33864.710696]  [8102b60a] ? x86_perf_event_set_period+0xda/0x180
[33864.717842]  [81167f89] perf_event_overflow+0x19/0x20
[33864.724195]  [810338f2] intel_pmu_handle_irq+0x1e2/0x450
[33864.730840]  [8102a41b] perf_event_nmi_handler+0x2b/0x50
[33864.737436]  [810185d0] nmi_handle+0xa0/0x150
[33864.743025]  [81018535] ? nmi_handle+0x5/0x150
[33864.748733]  [8101887a] default_do_nmi+0x4a/0x140
[33864.754705]  [81018a08] do_nmi+0x98/0xe0
[33864.759858]  [816df42f] end_repeat_nmi+0x1e/0x2e
[33864.765746]  [810bc1bb] ? check_chain_key+0xdb/0x1e0
[33864.772004]  [810bc1bb] ? check_chain_key+0xdb/0x1e0
[33864.778253]  [810bc1bb] ? check_chain_key+0xdb/0x1e0
[33864.784498]  EOE  IRQ  [810bdd09] 
__lock_acquire.isra.31+0x3b9/0x1000
[33864.792950]  [810bdd09] ? __lock_acquire.isra.31+0x3b9/0x1000
[33864.800045]  [810bf2e5] lock_acquire+0xa5/0x130
[33864.805817]  [81080afe] ? __lock_task_sighand+0x6e/0x110
[33864.812468]  [81080aaa] ? __lock_task_sighand+0x1a/0x110
[33864.819084]  [816dc721] _raw_spin_lock+0x31/0x40
[33864.824979]  [81080afe] ? __lock_task_sighand+0x6e/0x110
[33864.831623]  [81080afe] __lock_task_sighand+0x6e/0x110
[33864.838096]  [81080aaa] ? __lock_task_sighand+0x1a/0x110
[33864.845314]  [81080bcc] do_send_sig_info+0x2c/0x80
[33864.851949]  [81167807] ? perf_swevent_event+0x67/0x90
[33864.858980]  [8120092f] send_sigio_to_task+0x12f/0x1a0
[33864.866005]  [81200805] ? send_sigio_to_task+0x5/0x1a0
[33864.873047]  [812010c6] ? send_sigio+0x56/0x100
[33864.879411]  [8120111e] send_sigio+0xae/0x100
[33864.885564]  [81201207] kill_fasync+0x97/0xf0
[33864.891713]  [8120117f] ? kill_fasync+0xf/0xf0
[33864.897983]  [811667d4] perf_event_wakeup+0xd4/0xf0
[33864.904662]  [81166705] ? perf_event_wakeup+0x5/0xf0
[33864.911490]  [811668d0] ? perf_pending_event+0xe0/0x110
[33864.918580]  [811668d0] perf_pending_event+0xe0/0x110
[33864.925494]  [81156eac] irq_work_run_list+0x4c/0x80
[33864.932197]  [81156ef8] irq_work_run+0x18/0x40
[33864.938469]  [8101953f] smp_trace_irq_work_interrupt+0x3f/0xc0
[33864.946263]  [816dea3e] trace_irq_work_interrupt+0x6e/0x80
[33864.953646]  EOI  [811a1b47] ? copy_page_range+0x527/0x9a0
[33864.961287]