Peter Zijlstra <pet...@infradead.org> writes:

> Alex, any clue?

Let me look into it. Definitely haven't seen anything like that in my
tests.

> On Fri, Jun 12, 2015 at 02:42:36PM -0400, Vince Weaver wrote:
>> On Thu, 11 Jun 2015, Vince Weaver wrote:
>> 
>> > and while I was trying to cut and paste that, the locked haswell just took 
>> > down the network switch so I can't get the rest until tomorrow.
>> 
>> here are the full dumps if anyone is interested
>> 
>> the warning is reproducible, the spinlock disaster is not.
>> 
>> [36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
>> [36298.992915]  lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: 
>> perf_fuzzer/3410, .owner_cpu: 4
>> [36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G        W       
>> 4.1.0-rc7+ #155
>> [36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 
>> 01/26/2014
>> [36299.020606]  ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 
>> 0000000000000054
>> [36299.029199]  ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 
>> ffff88011edf7cd0
>> [36299.037796]  ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 
>> ffff88011edf7cd0
>> [36299.046338] Call Trace:
>> [36299.049501]  <NMI>  [<ffffffff816d7229>] dump_stack+0x45/0x57
>> [36299.056284]  [<ffffffff810c2895>] spin_dump+0x85/0xe0
>> [36299.062282]  [<ffffffff810c2916>] spin_bug+0x26/0x30
>> [36299.068111]  [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
>> [36299.074897]  [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
>> [36299.081276]  [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
>> [36299.088340]  [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
>> [36299.095182]  [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
>> [36299.102291]  [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
>> [36299.108987]  [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
>> [36299.115850]  [<ffffffff8117aa47>] __free_pages+0x37/0x50
>> [36299.121991]  [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0

This one goes to free aux pages from nmi context, looks like aux buffer
was unmapped while the event was running, so here it dropped the last
reference.

>> [36299.128034]  [<ffffffff8116b0e7>] perf_aux_output_end+0xb7/0xf0
>> [36299.134793]  [<ffffffff81037b0e>] intel_bts_interrupt+0x8e/0xd0
>> [36299.141543]  [<ffffffff810338bf>] intel_pmu_handle_irq+0x4f/0x450
>> [36299.148482]  [<ffffffff810bc288>] ? check_chain_key+0x128/0x1e0
>> [36299.155249]  [<ffffffff8102a4ab>] perf_event_nmi_handler+0x2b/0x50
>> [36299.162273]  [<ffffffff810185d0>] nmi_handle+0xa0/0x150

[snip]

>> 
>> and
>> 
>> [27716.785131] WARNING: CPU: 2 PID: 17655 at kernel/events/ring_buffer.c:282 
>> perf_aux_output_begin+0x1ce/0x1f0()
>> [27716.796157] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp 
>> intel_rapl iosf_mbi coretemp kvm_intel kvm snd_hda_codec_realtek 
>> snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller 
>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec aesni_intel 
>> snd_hda_core aes_x86_64 iTCO_wdt lrw snd_hwdep gf128mul snd_pcm 
>> iTCO_vendor_support ppdev i915snd_timer glue_helper evdev ablk_helper snd 
>> cryptd mei_me mei psmouse pcspkr soundcore serio_raw lpc_ich drm_kms_helper 
>> parport_pc battery tpm_tis mfd_core parport drm video tpm i2c_i801 
>> i2c_algo_bit wmi processor button sg sr_mod cdrom sd_mod ehci_pci ehci_hcd 
>> ahci libahci libata xhci_pci xhci_hcd e1000e ptp scsi_mod crc32c_intel 
>> usbcore pps_core usb_common thermal fan thermal_sys
>> [27716.873350] CPU: 2 PID: 17655 Comm: perf_fuzzer Tainted: G        W       
>> 4.1.0-rc7+ #155
>> [27716.882671] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 
>> 01/26/2014
>> [27716.891168]  ffffffff81a230f2 ffff8800cf9afc68 ffffffff816d7229 
>> 0000000000000000
>> [27716.899782]  0000000000000000 ffff8800cf9afca8 ffffffff81072eba 
>> 000000000000cf80
>> [27716.908372]  ffff880036d85a00 ffff88011ea8cf80 ffff880119106000 
>> 000000000000cf80
>> [27716.916981] Call Trace:
>> [27716.920180]  [<ffffffff816d7229>] dump_stack+0x45/0x57
>> [27716.926250]  [<ffffffff81072eba>] warn_slowpath_common+0x8a/0xc0
>> [27716.933244]  [<ffffffff81072faa>] warn_slowpath_null+0x1a/0x20
>> [27716.939999]  [<ffffffff8116b00e>] perf_aux_output_begin+0x1ce/0x1f0
>> [27716.947225]  [<ffffffff81164785>] ? perf_event_update_userpage+0x5/0x170
>> [27716.954875]  [<ffffffff8103767d>] bts_event_add+0x6d/0x150
>> [27716.961259]  [<ffffffff81162ddb>] event_sched_in.isra.85+0xfb/0x330
>> [27716.968495]  [<ffffffff811630ca>] group_sched_in+0xba/0x1e0
>> [27716.974957]  [<ffffffff8101df2a>] ? native_sched_clock+0x2a/0x90
>> [27716.981888]  [<ffffffff811633c0>] ctx_sched_in+0x1d0/0x3f0
>> [27716.988274]  [<ffffffff81163649>] perf_event_sched_in+0x69/0xa0
>> [27716.995082]  [<ffffffff81163bc0>] perf_event_context_sched_in+0x90/0x160
>> [27717.002714]  [<ffffffff81164148>] __perf_event_task_sched_in+0xa8/0xc0
>> [27717.010160]  [<ffffffff8109897d>] finish_task_switch+0xed/0x140
>> [27717.016988]  [<ffffffff81012723>] ? __switch_to+0x263/0x5d0
>> [27717.023400]  [<ffffffff8109e5a1>] schedule_tail+0x11/0xa0
>> [27717.029652]  [<ffffffff816df57f>] ret_from_fork+0xf/0x70
>> [27717.035806] ---[ end trace bdbfeef5d81e5fa4 ]---
>> 
>> which corresponds to 
>>         /*
>>          * Nesting is not supported for AUX area, make sure nested
>>          * writers are caught early
>>          */
>>         if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1)))
>>                 goto err_put;
>> 

This really shouldn't happen, I'd need to check with the fuzzer.

Thanks,
--
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to