Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate
On 04/17/2013 08:27 PM, Sarah Sharp wrote: -- snip -- [Sarah reworked this patch to cover the case where the xHCI restore register operation fails, and (temp STS_SRE) is true (and we re-init the host, including re-init for the compliance mode), but hibernate is false. The original patch would have caused list corruption in this case.] Signed-off-by: Tony Camuso tcam...@redhat.com Acked-by: Don Zickus dzic...@redhat.com Signed-off-by: Sarah Sharp sarah.a.sh...@linux.intel.com --- Tony, will you test this version? Thanks! Sarah, With ioatdma blacklisted, I was able to test this patch on an hp z820, which is the system family on which the original problem was detected by hp. The amended patch works correctly. State is correctly restored from both hibernate and suspend. -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate
On 04/23/2013 05:14 PM, Sarah Sharp wrote: On Tue, Apr 23, 2013 at 04:18:00PM -0400, Tony Camuso wrote: It hangs in both suspend and resume with vanilla 3.9-rc7 and rc8. Neither the new patch nor the old patch helps. It might be unrelated to USB then. Let's see what the bisect shows. It does appear to be unrelated to usb. Now we need to determine why this should cause problems with suspend/resume. - # git bisect bad 4dec23d7718e6f1f5e1773698d112025169e7d49 is the first bad commit commit 4dec23d7718e6f1f5e1773698d112025169e7d49 Author: Dave Jiang dave.ji...@intel.com Date: Thu Feb 7 14:38:32 2013 -0700 ioatdma: fix race between updating ioat-head and IOAT_COMPLETION_PENDING There is a race that can hit during __cleanup() when the ioat-head pointer is incremented during descriptor submission. The __cleanup() can clear the PENDING flag when it does not see any active descriptors. This causes new submitted descriptors to be ignored because the COMPLETION_PENDING flag is cleared. This was introduced when code was adapted from ioatdma v1 to ioatdma v2. For v2 and v3, IOAT_COMPLETION_PENDING flag will be abandoned and a new flag IOAT_CHAN_ACTIVE will be utilized. This flag will also be protected under the prep_lock when being modified in order to avoid the race. Signed-off-by: Dave Jiang dave.ji...@intel.com Reviewed-by: Dan Williams d...@fb.com Signed-off-by: Vinod Koul vinod.k...@intel.com :04 04 6a99537c215da68179b7431d165c4aa88c2f569d b339a473063ad04cd985399e43e2e11c9a2a7026 M drivers - Here is a Call Trace when resuming from suspend with the above patch. - [ 96.324588] ioatdma :40:04.0: ioat2_timer_event: Channel halted (1000) [ 96.324659] [ cut here ] [ 96.324661] kernel BUG at drivers/dma/ioat/dma_v2.c:317! [ 96.324665] invalid opcode: [#1] SMP [ 96.324709] Modules linked in: ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables kvm_intel snd_hda_codec_hdmi coretemp snd_hda_codec_realtek kvm snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd hp_wmi crc32c_intel sb_edac ioatdma sparse_keymap iTCO_wdt soundcore edac_core ghash_clmulni_intel rfkill e1000e iTCO_vendor_support mei microcode i2c_i801 lpc_ich pcspkr serio_raw dca mfd_core nouveau isci video mxm_wmi i2c_algo_bit drm_kms_helper ttm libsas drm mpt2sas firewire_ohci firewire_core ata_generic raid_class scsi_transport_sas i2c_core pata_acpi crc_itu_t wmi [ 96.324718] CPU 0 [ 96.324718] Pid: 0, comm: swapper/0 Not tainted 3.8.0-rc2+ #10 Hewlett-Packard HP Z820 Workstation/158B [ 96.324731] RIP: 0010:[a0275466] [a0275466] ioat2_timer_event+0x1b6/0x2c0 [ioatdma] [ 96.324733] RSP: 0018:8801a9c03df0 EFLAGS: 00010206 [ 96.324734] RAX: 0023 RBX: 8801a6880228 RCX: 00ae [ 96.324735] RDX: RSI: 0086 RDI: 0246 [ 96.324737] RBP: 8801a9c03e20 R08: 81e4b4e0 R09: 81e7faf0 [ 96.324738] R10: ba64 R11: 0004 R12: 1000 [ 96.324740] R13: 0100 R14: a02752b0 R15: 8801a6880228 [ 96.324742] FS: () GS:8801a9c0() knlGS: [ 96.324744] CS: 0010 DS: ES: CR0: 80050033 [ 96.324745] CR2: 7f8ce419 CR3: 01c0c000 CR4: 000407f0 [ 96.324747] DR0: DR1: DR2: [ 96.324748] DR3: DR6: 0ff0 DR7: 0400 [ 96.324751] Process swapper/0 (pid: 0, threadinfo 81c0, task 81c14440) [ 96.324751] Stack: [ 96.324758] 00019e050740 00019e050740 81e8c500 8801a6880290 [ 96.324763] 0100 a02752b0 8801a9c03e60 8106cb9a [ 96.324768] 8103d3d4 81e8c500 8801a6880290 a02752b0 [ 96.324769] Call Trace: [ 96.324774] IRQ [ 96.324780] [a02752b0] ? reshape_ring+0x330/0x330 [ioatdma] [ 96.324788] [8106cb9a] call_timer_fn+0x3a/0x120 [ 96.324794] [8103d3d4] ? native_apic_msr_eoi_write+0x14/0x20 [ 96.324801] [a02752b0] ? reshape_ring+0x330/0x330 [ioatdma] [ 96.324807] [81501ab0] ? cpufreq_p4_target+0x120/0x120 [ 96.324811] [8106e92e] run_timer_softirq+0x1fe/0x2b0 [ 96.324815] [81501ab0] ? cpufreq_p4_target+0x120/0x120 [
Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate
On Mon, Apr 22, 2013 at 02:38:57PM -0400, Tony Camuso wrote: On 04/18/2013 05:12 PM, Sarah Sharp wrote: On Thu, Apr 18, 2013 at 02:45:40PM -0400, Tony Camuso wrote: Sarah, This patch works with RHEL6.4+ kernel, but hangs with 3.9-rc7. I can attach the S3_Suspend_message.log, if you like, but here are the last few lines. Is it hanging on suspend or on resume? Does S3 work on 3.9-rc7 with your original patch (without the new comp_timer_running variable)? Does S3 work on the effected system with vanilla 3.9-rc7? Sarah, Just to let you know I haven't forgotten you ... I have been sidetracked by hardware or firmware problems. I need to try a different system. All right, thanks for letting me know. Sarah Sharp -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate
On 04/18/2013 05:12 PM, Sarah Sharp wrote: On Thu, Apr 18, 2013 at 02:45:40PM -0400, Tony Camuso wrote: Sarah, This patch works with RHEL6.4+ kernel, but hangs with 3.9-rc7. = snip = Is it hanging on suspend or on resume? 3.9.0-rc8 hangs in both suspend and resume on a hp z620 running F18, with or without the old and new patch. Meanwhile, 3.8.0 has the problem for which we originally devised this patch... ... BUT your revised patch fixes the problem in 3.8.0. If you like, I can bisect between 3.8.0 and 3.9.0-rc7 (where I first detected the new problem). Does S3 work on 3.9-rc7 with your original patch (without the new comp_timer_running variable)? Does S3 work on the effected system with vanilla 3.9-rc7? It hangs in both suspend and resume with vanilla 3.9-rc7 and rc8. Neither the new patch nor the old patch helps. The full log with CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING would be helpful. The logs are large, so I've posted them here. http://people.redhat.com/tcamuso/xhci-logs/ Logs were created with 3.9.0-rc8 commit 824282ca7d250bd7c301f221c3cd902ce906d731 Merge: f83b293 3b5e50e Author: Linus Torvalds torva...@linux-foundation.org Date: Mon Apr 22 15:00:59 2013 -0700 -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate
On Tue, Apr 23, 2013 at 04:18:00PM -0400, Tony Camuso wrote: On 04/18/2013 05:12 PM, Sarah Sharp wrote: On Thu, Apr 18, 2013 at 02:45:40PM -0400, Tony Camuso wrote: Sarah, This patch works with RHEL6.4+ kernel, but hangs with 3.9-rc7. = snip = Is it hanging on suspend or on resume? 3.9.0-rc8 hangs in both suspend and resume on a hp z620 running F18, with or without the old and new patch. Meanwhile, 3.8.0 has the problem for which we originally devised this patch... ... BUT your revised patch fixes the problem in 3.8.0. If you like, I can bisect between 3.8.0 and 3.9.0-rc7 (where I first detected the new problem). That would be great! Does S3 work on 3.9-rc7 with your original patch (without the new comp_timer_running variable)? Does S3 work on the effected system with vanilla 3.9-rc7? It hangs in both suspend and resume with vanilla 3.9-rc7 and rc8. Neither the new patch nor the old patch helps. It might be unrelated to USB then. Let's see what the bisect shows. The full log with CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING would be helpful. The logs are large, so I've posted them here. http://people.redhat.com/tcamuso/xhci-logs/ Logs were created with 3.9.0-rc8 Ok, I'll take a look at those tomorrow. Sarah Sharp -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate
On 04/18/2013 05:12 PM, Sarah Sharp wrote: On Thu, Apr 18, 2013 at 02:45:40PM -0400, Tony Camuso wrote: Sarah, This patch works with RHEL6.4+ kernel, but hangs with 3.9-rc7. I can attach the S3_Suspend_message.log, if you like, but here are the last few lines. Is it hanging on suspend or on resume? Does S3 work on 3.9-rc7 with your original patch (without the new comp_timer_running variable)? Does S3 work on the effected system with vanilla 3.9-rc7? Sarah, Just to let you know I haven't forgotten you ... I have been sidetracked by hardware or firmware problems. I need to try a different system. -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate
On 04/17/2013 08:27 PM, Sarah Sharp wrote: [Sarah reworked this patch to cover the case where the xHCI restore register operation fails, and (temp STS_SRE) is true (and we re-init the host, including re-init for the compliance mode), but hibernate is false. The original patch would have caused list corruption in this case.] Sarah, This patch works with RHEL6.4+ kernel, but hangs with 3.9-rc7. I can attach the S3_Suspend_message.log, if you like, but here are the last few lines. # tail /var/tmp/S3_Suspend_message.log Bridge firewalling registered device virbr0-nic entered promiscuous mode device virbr0-nic left promiscuous mode virbr0: port 1(virbr0-nic) entered disabled state Ebtables v2.0 registered SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs SELinux: initialized (dev proc, type proc), uses genfs_contexts SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs SELinux: initialized (dev proc, type proc), uses genfs_contexts usb 2-1.1: USB disconnect, device number 3 -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate
On Thu, Apr 18, 2013 at 02:45:40PM -0400, Tony Camuso wrote: On 04/17/2013 08:27 PM, Sarah Sharp wrote: [Sarah reworked this patch to cover the case where the xHCI restore register operation fails, and (temp STS_SRE) is true (and we re-init the host, including re-init for the compliance mode), but hibernate is false. The original patch would have caused list corruption in this case.] Sarah, This patch works with RHEL6.4+ kernel, but hangs with 3.9-rc7. I can attach the S3_Suspend_message.log, if you like, but here are the last few lines. Is it hanging on suspend or on resume? Does S3 work on 3.9-rc7 with your original patch (without the new comp_timer_running variable)? Does S3 work on the effected system with vanilla 3.9-rc7? # tail /var/tmp/S3_Suspend_message.log Bridge firewalling registered device virbr0-nic entered promiscuous mode device virbr0-nic left promiscuous mode virbr0: port 1(virbr0-nic) entered disabled state Ebtables v2.0 registered SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs SELinux: initialized (dev proc, type proc), uses genfs_contexts SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs SELinux: initialized (dev proc, type proc), uses genfs_contexts usb 2-1.1: USB disconnect, device number 3 The full log with CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING would be helpful. Sarah Sharp -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFT] xhci - correct comp_mode_recovery_timer on return from hibernate
From: Tony Camuso tcam...@redhat.com Commit 71c731a2 (usb: host: xhci: Fix Compliance Mode on SN65LVPE502CP Hardware) was a workaround for systems using the SN65LVPE502CP, controller, but it introduced a bug in resume from hibernate. The fix created a timer, comp_mode_recovery_timer, which is deleted from a timer list when xhci_suspend() is called. However, the hibernate image, including the timer list containing the comp_mode_recovery_timer, had already been saved before the timer was deleted. Upon resume from hibernate, the list containing the comp_mode_recovery_timer is restored from the image saved to disk, and xhci_resume(), assuming that the timer had been deleted by xhci_suspend(), makes a call to compliance_mode_recoery_timer_init(), which creates a new instance of the comp_mode_recovery_timer and attempts to place it into the same list in which it is already active, thus corrupting the list during the list_add() call. At this point, a call trace is emitted indicating the list corruption. Soon afterward, the system locks up, the watchdog times out, and the ensuing NMI crashes the system. The problem did not occur when resuming from suspend. In suspend, the image in RAM remains exactly as it was when xhci_suspend() deleted the comp_mode_recovery_timer, so there is no problem when xhci_resume() creates a new instance of this timer and places it in the still empty list. This patch avoids the problem by deleting the timer in xhci_resume() when resuming from hibernate. Now xhci_resume() can safely make the call to create a new instance of this timer, whether returning from suspend or hibernate. Thanks to Alan Stern for his help with understanding the problem. [Sarah reworked this patch to cover the case where the xHCI restore register operation fails, and (temp STS_SRE) is true (and we re-init the host, including re-init for the compliance mode), but hibernate is false. The original patch would have caused list corruption in this case.] Signed-off-by: Tony Camuso tcam...@redhat.com Acked-by: Don Zickus dzic...@redhat.com Signed-off-by: Sarah Sharp sarah.a.sh...@linux.intel.com --- Tony, will you test this version? Thanks! drivers/usb/host/xhci.c | 12 +++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 53b8f89..550a0d7 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -952,6 +952,7 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated) struct usb_hcd *hcd = xhci_to_hcd(xhci); struct usb_hcd *secondary_hcd; int retval = 0; + boolcomp_timer_running = false; /* Wait a bit if either of the roothubs need to settle from the * transition into bus suspend. @@ -989,6 +990,13 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated) /* If restore operation fails, re-initialize the HC during resume */ if ((temp STS_SRE) || hibernated) { + + if ((xhci-quirks XHCI_COMP_MODE_QUIRK) + !(xhci_all_ports_seen_u0(xhci))) { + del_timer_sync(xhci-comp_mode_recovery_timer); + xhci_dbg(xhci, Compliance Mode Recovery Timer deleted!\n); + } + /* Let the USB core know _both_ roothubs lost power. */ usb_root_hub_lost_power(xhci-main_hcd-self.root_hub); usb_root_hub_lost_power(xhci-shared_hcd-self.root_hub); @@ -1031,6 +1039,8 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated) retval = xhci_init(hcd-primary_hcd); if (retval) return retval; + comp_timer_running = true; + xhci_dbg(xhci, Start the primary HCD\n); retval = xhci_run(hcd-primary_hcd); if (!retval) { @@ -1072,7 +1082,7 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated) * to suffer the Compliance Mode issue again. It doesn't matter if * ports have entered previously to U0 before system's suspension. */ - if (xhci-quirks XHCI_COMP_MODE_QUIRK) + if ((xhci-quirks XHCI_COMP_MODE_QUIRK) !comp_timer_running) compliance_mode_recovery_timer_init(xhci); /* Re-enable port polling. */ -- 1.7.9 -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html