Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate

2013-04-27 Thread Tony Camuso

On 04/17/2013 08:27 PM, Sarah Sharp wrote:
-- snip --


[Sarah reworked this patch to cover the case where the xHCI restore
register operation fails, and (temp  STS_SRE) is true (and we re-init
the host, including re-init for the compliance mode), but hibernate is
false.  The original patch would have caused list corruption in this
case.]

Signed-off-by: Tony Camuso tcam...@redhat.com
Acked-by: Don Zickus dzic...@redhat.com
Signed-off-by: Sarah Sharp sarah.a.sh...@linux.intel.com
---

Tony, will you test this version?  Thanks!



Sarah,

With ioatdma blacklisted, I was able to test this patch on an hp z820,
which is the system family on which the original problem was detected
by hp.

The amended patch works correctly. State is correctly restored from
both hibernate and suspend.


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate

2013-04-25 Thread Tony Camuso

On 04/23/2013 05:14 PM, Sarah Sharp wrote:

On Tue, Apr 23, 2013 at 04:18:00PM -0400, Tony Camuso wrote:

It hangs in both suspend and resume with vanilla 3.9-rc7 and rc8.
Neither the new patch nor the old patch helps.


It might be unrelated to USB then.  Let's see what the bisect shows.



It does appear to be unrelated to usb. Now we need to determine why
this should cause problems with suspend/resume.

-
# git bisect bad
4dec23d7718e6f1f5e1773698d112025169e7d49 is the first bad commit
commit 4dec23d7718e6f1f5e1773698d112025169e7d49
Author: Dave Jiang dave.ji...@intel.com
Date:   Thu Feb 7 14:38:32 2013 -0700

ioatdma: fix race between updating ioat-head and IOAT_COMPLETION_PENDING

There is a race that can hit during __cleanup() when the ioat-head pointer is

incremented during descriptor submission. The __cleanup() can clear the
PENDING flag when it does not see any active descriptors. This causes new
submitted descriptors to be ignored because the COMPLETION_PENDING flag is
cleared. This was introduced when code was adapted from ioatdma v1 to 
ioatdma
v2. For v2 and v3, IOAT_COMPLETION_PENDING flag will be abandoned and a new
flag IOAT_CHAN_ACTIVE will be utilized. This flag will also be protected 
under
the prep_lock when being modified in order to avoid the race.

Signed-off-by: Dave Jiang dave.ji...@intel.com

Reviewed-by: Dan Williams d...@fb.com
Signed-off-by: Vinod Koul vinod.k...@intel.com

:04 04 6a99537c215da68179b7431d165c4aa88c2f569d 
b339a473063ad04cd985399e43e2e11c9a2a7026 M  drivers
-


Here is a Call Trace when resuming from suspend with the above patch.

-
[   96.324588] ioatdma :40:04.0: ioat2_timer_event: Channel halted (1000)
[   96.324659] [ cut here ]
[   96.324661] kernel BUG at drivers/dma/ioat/dma_v2.c:317!
[   96.324665] invalid opcode:  [#1] SMP
[   96.324709] Modules linked in: ipt_MASQUERADE nf_conntrack_netbios_ns 
nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 
nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 
nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables 
ip6table_filter ip6_tables kvm_intel snd_hda_codec_hdmi coretemp 
snd_hda_codec_realtek kvm snd_hda_intel snd_hda_codec snd_hwdep snd_seq 
snd_seq_device snd_pcm snd_page_alloc snd_timer snd hp_wmi crc32c_intel sb_edac 
ioatdma sparse_keymap iTCO_wdt soundcore edac_core ghash_clmulni_intel rfkill 
e1000e iTCO_vendor_support mei microcode i2c_i801 lpc_ich pcspkr serio_raw dca 
mfd_core nouveau isci video mxm_wmi i2c_algo_bit drm_kms_helper ttm libsas drm 
mpt2sas firewire_ohci firewire_core ata_generic raid_class scsi_transport_sas 
i2c_core pata_acpi crc_itu_t wmi
[   96.324718] CPU 0
[   96.324718] Pid: 0, comm: swapper/0 Not tainted 3.8.0-rc2+ #10 
Hewlett-Packard HP Z820 Workstation/158B
[   96.324731] RIP: 0010:[a0275466]  [a0275466] 
ioat2_timer_event+0x1b6/0x2c0 [ioatdma]
[   96.324733] RSP: 0018:8801a9c03df0  EFLAGS: 00010206
[   96.324734] RAX: 0023 RBX: 8801a6880228 RCX: 00ae
[   96.324735] RDX:  RSI: 0086 RDI: 0246
[   96.324737] RBP: 8801a9c03e20 R08: 81e4b4e0 R09: 81e7faf0
[   96.324738] R10: ba64 R11: 0004 R12: 1000
[   96.324740] R13: 0100 R14: a02752b0 R15: 8801a6880228
[   96.324742] FS:  () GS:8801a9c0() 
knlGS:
[   96.324744] CS:  0010 DS:  ES:  CR0: 80050033
[   96.324745] CR2: 7f8ce419 CR3: 01c0c000 CR4: 000407f0
[   96.324747] DR0:  DR1:  DR2: 
[   96.324748] DR3:  DR6: 0ff0 DR7: 0400
[   96.324751] Process swapper/0 (pid: 0, threadinfo 81c0, task 
81c14440)
[   96.324751] Stack:
[   96.324758]  00019e050740 00019e050740 81e8c500 
8801a6880290
[   96.324763]  0100 a02752b0 8801a9c03e60 
8106cb9a
[   96.324768]  8103d3d4 81e8c500 8801a6880290 
a02752b0
[   96.324769] Call Trace:
[   96.324774]  IRQ
[   96.324780]  [a02752b0] ? reshape_ring+0x330/0x330 [ioatdma]
[   96.324788]  [8106cb9a] call_timer_fn+0x3a/0x120
[   96.324794]  [8103d3d4] ? native_apic_msr_eoi_write+0x14/0x20
[   96.324801]  [a02752b0] ? reshape_ring+0x330/0x330 [ioatdma]
[   96.324807]  [81501ab0] ? cpufreq_p4_target+0x120/0x120
[   96.324811]  [8106e92e] run_timer_softirq+0x1fe/0x2b0
[   96.324815]  [81501ab0] ? cpufreq_p4_target+0x120/0x120
[   

Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate

2013-04-23 Thread Sarah Sharp
On Mon, Apr 22, 2013 at 02:38:57PM -0400, Tony Camuso wrote:
 On 04/18/2013 05:12 PM, Sarah Sharp wrote:
 On Thu, Apr 18, 2013 at 02:45:40PM -0400, Tony Camuso wrote:
 Sarah,
 
 This patch works with RHEL6.4+ kernel, but hangs with 3.9-rc7.
 
 I can attach the S3_Suspend_message.log, if you like, but here are the
 last few lines.
 
 Is it hanging on suspend or on resume?
 
 Does S3 work on 3.9-rc7 with your original patch (without the new
 comp_timer_running variable)?  Does S3 work on the effected system with
 vanilla 3.9-rc7?
 
 
 Sarah,
 
 Just to let you know I haven't forgotten you ...
 
 I have been sidetracked by hardware or firmware problems. I need to
 try a different system.

All right, thanks for letting me know.

Sarah Sharp
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate

2013-04-23 Thread Tony Camuso

On 04/18/2013 05:12 PM, Sarah Sharp wrote:

On Thu, Apr 18, 2013 at 02:45:40PM -0400, Tony Camuso wrote:


Sarah,

This patch works with RHEL6.4+ kernel, but hangs with 3.9-rc7.


= snip =


Is it hanging on suspend or on resume?



3.9.0-rc8 hangs in both suspend and resume on a hp z620 running F18,
with or without the old and new patch.

Meanwhile, 3.8.0 has the problem for which we originally devised this 
patch...


... BUT your revised patch fixes the problem in 3.8.0.

If you like, I can bisect between 3.8.0 and 3.9.0-rc7 (where I first 
detected the new problem).



Does S3 work on 3.9-rc7 with your original patch (without the new
comp_timer_running variable)?  Does S3 work on the effected system with
vanilla 3.9-rc7?



It hangs in both suspend and resume with vanilla 3.9-rc7 and rc8. 
Neither the new patch nor the old patch helps.




The full log with CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING
would be helpful.



The logs are large, so I've posted them here.
http://people.redhat.com/tcamuso/xhci-logs/

Logs were created with 3.9.0-rc8

commit 824282ca7d250bd7c301f221c3cd902ce906d731
Merge: f83b293 3b5e50e
Author: Linus Torvalds torva...@linux-foundation.org
Date:   Mon Apr 22 15:00:59 2013 -0700


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate

2013-04-23 Thread Sarah Sharp
On Tue, Apr 23, 2013 at 04:18:00PM -0400, Tony Camuso wrote:
 On 04/18/2013 05:12 PM, Sarah Sharp wrote:
 On Thu, Apr 18, 2013 at 02:45:40PM -0400, Tony Camuso wrote:
 
 Sarah,
 
 This patch works with RHEL6.4+ kernel, but hangs with 3.9-rc7.
 
 = snip =
 
 Is it hanging on suspend or on resume?
 
 
 3.9.0-rc8 hangs in both suspend and resume on a hp z620 running F18,
 with or without the old and new patch.
 
 Meanwhile, 3.8.0 has the problem for which we originally devised
 this patch...
 
 ... BUT your revised patch fixes the problem in 3.8.0.
 
 If you like, I can bisect between 3.8.0 and 3.9.0-rc7 (where I first
 detected the new problem).

That would be great!

 Does S3 work on 3.9-rc7 with your original patch (without the new
 comp_timer_running variable)?  Does S3 work on the effected system with
 vanilla 3.9-rc7?
 
 
 It hangs in both suspend and resume with vanilla 3.9-rc7 and rc8.
 Neither the new patch nor the old patch helps.

It might be unrelated to USB then.  Let's see what the bisect shows.

 The full log with CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING
 would be helpful.
 
 
 The logs are large, so I've posted them here.
 http://people.redhat.com/tcamuso/xhci-logs/
 
 Logs were created with 3.9.0-rc8

Ok, I'll take a look at those tomorrow.

Sarah Sharp
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate

2013-04-22 Thread Tony Camuso

On 04/18/2013 05:12 PM, Sarah Sharp wrote:

On Thu, Apr 18, 2013 at 02:45:40PM -0400, Tony Camuso wrote:

Sarah,

This patch works with RHEL6.4+ kernel, but hangs with 3.9-rc7.

I can attach the S3_Suspend_message.log, if you like, but here are the
last few lines.


Is it hanging on suspend or on resume?

Does S3 work on 3.9-rc7 with your original patch (without the new
comp_timer_running variable)?  Does S3 work on the effected system with
vanilla 3.9-rc7?



Sarah,

Just to let you know I haven't forgotten you ...

I have been sidetracked by hardware or firmware problems. I need to try 
a different system.


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate

2013-04-18 Thread Tony Camuso

On 04/17/2013 08:27 PM, Sarah Sharp wrote:


[Sarah reworked this patch to cover the case where the xHCI restore
register operation fails, and (temp  STS_SRE) is true (and we re-init
the host, including re-init for the compliance mode), but hibernate is
false.  The original patch would have caused list corruption in this
case.]



Sarah,

This patch works with RHEL6.4+ kernel, but hangs with 3.9-rc7.

I can attach the S3_Suspend_message.log, if you like, but here are the
last few lines.

# tail /var/tmp/S3_Suspend_message.log
Bridge firewalling registered
device virbr0-nic entered promiscuous mode
device virbr0-nic left promiscuous mode
virbr0: port 1(virbr0-nic) entered disabled state
Ebtables v2.0 registered
SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs
SELinux: initialized (dev proc, type proc), uses genfs_contexts
SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs
SELinux: initialized (dev proc, type proc), uses genfs_contexts
usb 2-1.1: USB disconnect, device number 3


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] xhci - correct comp_mode_recovery_timer on return from hibernate

2013-04-18 Thread Sarah Sharp
On Thu, Apr 18, 2013 at 02:45:40PM -0400, Tony Camuso wrote:
 On 04/17/2013 08:27 PM, Sarah Sharp wrote:
 
 [Sarah reworked this patch to cover the case where the xHCI restore
 register operation fails, and (temp  STS_SRE) is true (and we re-init
 the host, including re-init for the compliance mode), but hibernate is
 false.  The original patch would have caused list corruption in this
 case.]
 
 
 Sarah,
 
 This patch works with RHEL6.4+ kernel, but hangs with 3.9-rc7.
 
 I can attach the S3_Suspend_message.log, if you like, but here are the
 last few lines.

Is it hanging on suspend or on resume?

Does S3 work on 3.9-rc7 with your original patch (without the new
comp_timer_running variable)?  Does S3 work on the effected system with
vanilla 3.9-rc7?

 # tail /var/tmp/S3_Suspend_message.log
 Bridge firewalling registered
 device virbr0-nic entered promiscuous mode
 device virbr0-nic left promiscuous mode
 virbr0: port 1(virbr0-nic) entered disabled state
 Ebtables v2.0 registered
 SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs
 SELinux: initialized (dev proc, type proc), uses genfs_contexts
 SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs
 SELinux: initialized (dev proc, type proc), uses genfs_contexts
 usb 2-1.1: USB disconnect, device number 3

The full log with CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING
would be helpful.

Sarah Sharp
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFT] xhci - correct comp_mode_recovery_timer on return from hibernate

2013-04-17 Thread Sarah Sharp
From: Tony Camuso tcam...@redhat.com

Commit 71c731a2 (usb: host: xhci: Fix Compliance Mode on SN65LVPE502CP
Hardware) was a workaround for systems using the SN65LVPE502CP,
controller, but it introduced a bug in resume from hibernate.

The fix created a timer, comp_mode_recovery_timer, which is deleted from
a timer list when xhci_suspend() is called. However, the hibernate image,
including the timer list containing the comp_mode_recovery_timer, had
already been saved before the timer was deleted.

Upon resume from hibernate, the list containing the comp_mode_recovery_timer
is restored from the image saved to disk, and xhci_resume(), assuming that
the timer had been deleted by xhci_suspend(), makes a call to
compliance_mode_recoery_timer_init(), which creates a new instance of the
comp_mode_recovery_timer and attempts to place it into the same list in which
it is already active, thus corrupting the list during the list_add() call.

At this point, a call trace is emitted indicating the list corruption.
Soon afterward, the system locks up, the watchdog times out, and the
ensuing NMI crashes the system.

The problem did not occur when resuming from suspend. In suspend, the
image in RAM remains exactly as it was when xhci_suspend() deleted the
comp_mode_recovery_timer, so there is no problem when xhci_resume()
creates a new instance of this timer and places it in the still empty
list.

This patch avoids the problem by deleting the timer in xhci_resume()
when resuming from hibernate. Now xhci_resume() can safely make the
call to create a new instance of this timer, whether returning from
suspend or hibernate.

Thanks to Alan Stern for his help with understanding the problem.

[Sarah reworked this patch to cover the case where the xHCI restore
register operation fails, and (temp  STS_SRE) is true (and we re-init
the host, including re-init for the compliance mode), but hibernate is
false.  The original patch would have caused list corruption in this
case.]

Signed-off-by: Tony Camuso tcam...@redhat.com
Acked-by: Don Zickus dzic...@redhat.com
Signed-off-by: Sarah Sharp sarah.a.sh...@linux.intel.com
---


Tony, will you test this version?  Thanks!


 drivers/usb/host/xhci.c |   12 +++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 53b8f89..550a0d7 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -952,6 +952,7 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
struct usb_hcd  *hcd = xhci_to_hcd(xhci);
struct usb_hcd  *secondary_hcd;
int retval = 0;
+   boolcomp_timer_running = false;
 
/* Wait a bit if either of the roothubs need to settle from the
 * transition into bus suspend.
@@ -989,6 +990,13 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
 
/* If restore operation fails, re-initialize the HC during resume */
if ((temp  STS_SRE) || hibernated) {
+
+   if ((xhci-quirks  XHCI_COMP_MODE_QUIRK) 
+   !(xhci_all_ports_seen_u0(xhci))) {
+   del_timer_sync(xhci-comp_mode_recovery_timer);
+   xhci_dbg(xhci, Compliance Mode Recovery Timer 
deleted!\n);
+   }
+
/* Let the USB core know _both_ roothubs lost power. */
usb_root_hub_lost_power(xhci-main_hcd-self.root_hub);
usb_root_hub_lost_power(xhci-shared_hcd-self.root_hub);
@@ -1031,6 +1039,8 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
retval = xhci_init(hcd-primary_hcd);
if (retval)
return retval;
+   comp_timer_running = true;
+
xhci_dbg(xhci, Start the primary HCD\n);
retval = xhci_run(hcd-primary_hcd);
if (!retval) {
@@ -1072,7 +1082,7 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
 * to suffer the Compliance Mode issue again. It doesn't matter if
 * ports have entered previously to U0 before system's suspension.
 */
-   if (xhci-quirks  XHCI_COMP_MODE_QUIRK)
+   if ((xhci-quirks  XHCI_COMP_MODE_QUIRK)  !comp_timer_running)
compliance_mode_recovery_timer_init(xhci);
 
/* Re-enable port polling. */
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html