On Sunday, November 24, 2013 10:39:20 AM Francis Moreau wrote: > Hello Thomas > > On 11/22/2013 11:27 PM, Thomas Gleixner wrote: > > On Fri, 22 Nov 2013, Rafael J. Wysocki wrote: > >> On Friday, November 22, 2013 10:36:23 PM Francis Moreau wrote: > >>> Ok, I've finally managed to find out the bad commit: > >>> ad07277e82dedabacc52c82746633680a3187d25: ACPI / PM: Hold acpi_scan_lock > >>> over system PM transitions > >>> > >>> I verified that the parent commit doesn't have the problem. > >> > >> Interesting. > >> > >>> Rafael, you're the man now ;) > >> > >> I kind of don't see how that commit may result in behavior that you > >> described earlier in the thread. > >> > >> You get a memory corruption that seems to have started to happen because > >> we're holding an additional lock over suspend resume now. Something's > >> fishy > >> on that machine and we need to figure out what it is. > > > > The hickup happens in the timer softirq. > > > > @Francis: Did you try to enable DEBUG_OBJECTS.*. If not please give it > > a try. > > This looks like it was a good idea. > > The kernel now outputs the following traces after resuming. > > [ 26.973928] WARNING: CPU: 0 PID: 4 at lib/debugobjects.c:260 > debug_print_object+0x83/0xa0() > [ 26.973932] ODEBUG: free active (active state 0) object type: > timer_list hint: delayed_work_timer_fn+0x0/0x20 > [ 26.973972] Modules linked in: x86_pkg_temp_thermal intel_powerclamp > rtsx_pci_ms coretemp memstick kvm_intel i2c_i801 iTCO_wdt > iTCO_vendor_support i915 i2c_algo_bit intel_agp intel_gtt drm_kms_helper > r8169 drm kvm mii agpgart i2c_core lpc_ich ac shpchp crc32c_intel > battery thermal wmi evdev mei_me video mei button mperf processor > serio_raw microcode ext4 crc16 mbcache jbd2 sr_mod cdrom sd_mod > usb_storage rtsx_pci_sdmmc mmc_core ahci libahci libata ehci_pci > ehci_hcd xhci_hcd scsi_mod rtsx_pci usbcore usb_common > [ 26.974013] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted > 3.11.0-rc2-ARCH #64 > [ 26.974014] Hardware name: CLEVO CO. W55xEU > /W55xEU , BIOS 4.6.5 > 03/05/2013 > [ 26.974019] Workqueue: kacpi_hotplug hotplug_event_work > [ 26.974020] 0000000000000009 ffff880407d0da18 ffffffff81459fe9 > ffff880407d0da60 > [ 26.974023] ffff880407d0da50 ffffffff8104dc7d ffff880407fad488 > ffffffff81836fc0 > [ 26.974025] ffffffff81701358 ffffffff81afef70 0000000000000003 > ffff880407d0dab0 > [ 26.974027] Call Trace: > [ 26.974031] [<ffffffff81459fe9>] dump_stack+0x54/0x8d > [ 26.974043] [<ffffffff8104dc7d>] warn_slowpath_common+0x7d/0xa0 > [ 26.974044] [<ffffffff8104dcec>] warn_slowpath_fmt+0x4c/0x50 > [ 26.974047] [<ffffffff81261433>] debug_print_object+0x83/0xa0 > [ 26.974050] [<ffffffff8106b820>] ? queue_work_on+0x50/0x50 > [ 26.974053] [<ffffffff81261c2b>] __debug_check_no_obj_freed+0x1fb/0x240 > [ 26.974059] [<ffffffffa008e959>] ? rtsx_pci_remove+0x119/0x1d0 > [rtsx_pci]
So a device driven by rtsx_pcr.c is removed after resume. Without the commit you've bisected it is removed as well, but that happens during resume, so rtsx_pci_resume() is likely not called in that case. I bet that there's a bug either in rtsx_pci_remove() or in rtsx_pci_resume(). The latter definitely should check if the device is actually still present before scheduling the delayed work, but then the Boris' patch should take care of that anyway. Thanks! -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/