Re: Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
Waiting on the day I go home ! -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/818a7978-671c-4db5-a166-d3d3dc23a...@gmail.com
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
On Sun, Jul 15, 2012 at 11:41:33PM +0100, Ben Hutchings wrote: Hi, I assume you mean this patch: http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=65;filename=0001-usb-Fix-deadlock-in-hid_reset-when-Dell-iDRAC.patch;att=1;bug=670398 so I'll apply that. Exactly, that would be great. It won't be accepted into a 2.6.32.y release unless someone can explain how it was fixed upstream (ideally, which commit(s) fixed it). I think it was somewhere mentioned that it got fixed with some USB-HID rewrite in 2.6.36 or 2.6.37. We could not reproduce it with Linux 3.2 from backports and internal builds of 2.6.37. But I can see that this isn't a proper explanation or reason for an inclusion upstream. Sven -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120716142519.gc23...@sho.bk.hosteurope.de
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
On Sun, Jul 15, 2012 at 11:41:33PM +0100, Ben Hutchings wrote: Hi, I assume you mean this patch: http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=65;filename=0001- usb-Fix-deadlock-in-hid_reset-when-Dell-iDRAC.patch;att=1;bug=670398 so I'll apply that. Exactly, that would be great. It won't be accepted into a 2.6.32.y release unless someone can explain how it was fixed upstream (ideally, which commit(s) fixed it). I think it was somewhere mentioned that it got fixed with some USB-HID rewrite in 2.6.36 or 2.6.37. We could not reproduce it with Linux 3.2 from backports and internal builds of 2.6.37. But I can see that this isn't a proper explanation or reason for an inclusion upstream. Sven I believe this was fixed with changes to the kernel workqueue code (in 2.6.36, I believe). In older kernels, the kernel workqueue would run one queued item at a time, and wait for it to finish before running the next one. The function hid_reset() is running on keventd (the workqueue). When a transaction run by hid_reset() gets a timeout trying to talk to the iDRAC, it puts hub_tt_work() on the workqueue to be run by keventd (by calling schedule_work(tt-clear_work)) and waits for it to finish. But keventd is waiting for hid_reset() to finish before it will run hub_tt_work(). So deadlock. Later kernels don't wait for each item on the workqueue to finish before starting the next, as I recall. Stuart -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/959d45574d89af41a9dadf6f446a2e9a2aef6f1...@ausx7mcps310.amer.dell.com
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
On Tue, 2012-05-29 at 09:42 +0200, Sven Hoexter wrote: On Mon, May 21, 2012 at 10:25:09PM +0530, shyam_i...@dell.com wrote: Hi, We have observed that doing a reset on idrac on low-end server like R|T210 R|T310 triggers the panic whereas the high end servers do not deadlock on an iDRAC reset so we know that this timing dependent. Ah thanks that matches our observations. Ben - I had attached the patch to the earlier thread. Let me know if you need any additional work from me on this. We've now applied that patch to the latest Debian Squeeze Kernel release and indeed fixes the 'racreset' issue. Ben, is there a chance to get that one included in the Debian Kernel or even better in a 2.6.32.x release upstream? Since we see the same issue with Ubuntu 10.04 I've to open a bugreport with them aswell. I assume you mean this patch: http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=65;filename=0001-usb-Fix-deadlock-in-hid_reset-when-Dell-iDRAC.patch;att=1;bug=670398 so I'll apply that. It won't be accepted into a 2.6.32.y release unless someone can explain how it was fixed upstream (ideally, which commit(s) fixed it). Ben. -- Ben Hutchings Beware of programmers who carry screwdrivers. - Leonard Brandwein signature.asc Description: This is a digitally signed message part
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
Hello, we have the same problem with all our DELL R210 and R210 II and Debian Squeeze. Jul 5 09:44:51 da16 kernel: [10760029.449586] usb 1-1.1: reset full speed USB device using ehci_hcd and address 3 Jul 5 09:45:06 da16 kernel: [10760044.513750] usb 1-1.1: device descriptor read/64, error -110 Jul 5 09:45:21 da16 kernel: [10760059.680488] usb 1-1.1: device descriptor read/64, error -110 and then no keyboard after ssh login and all processes hangs. I'm doing reboot with ssh root@host reboot and for few months or more everything is ok. Nobody is logged in DRAC and we don't have any Dell apps for DRAC control running. It occurs randomly. Kernel: linux-image-2.6.32-5-amd64 2.6.32-45 Can you tell us when it will be fixed upstream? Regards, Adam -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4ff55617.8010...@domeny.pl
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
On Mon, May 21, 2012 at 10:25:09PM +0530, shyam_i...@dell.com wrote: Hi, We have observed that doing a reset on idrac on low-end server like R|T210 R|T310 triggers the panic whereas the high end servers do not deadlock on an iDRAC reset so we know that this timing dependent. Ah thanks that matches our observations. Ben - I had attached the patch to the earlier thread. Let me know if you need any additional work from me on this. We've now applied that patch to the latest Debian Squeeze Kernel release and indeed fixes the 'racreset' issue. Ben, is there a chance to get that one included in the Debian Kernel or even better in a 2.6.32.x release upstream? Since we see the same issue with Ubuntu 10.04 I've to open a bugreport with them aswell. Sven -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120529074240.ga2...@sho.bk.hosteurope.de
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
clone 670398 -1 retitle -1 Deadlock in hid_reset when Dell iDRAC is reset tags -1 + upstream patch moreinfo quit shyam_i...@dell.com wrote: It doesn't seem like this is the same bug. Indeed. Cloning as a reminder to raise this upstream. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120522214317.GA22796@burratino
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
-Original Message- From: Sven Hoexter [mailto:s...@timegate.de] Sent: Tuesday, May 15, 2012 5:46 AM To: Iyer, Shyam Cc: b...@decadent.org.uk; s...@timegate.de; 670...@bugs.debian.org; Hayes, Stuart Subject: Re: Deadlock in hid_reset when Dell iDRAC is reset On Tue, May 01, 2012 at 10:15:37AM +0530, shyam_i...@dell.com wrote: Hi, Was the usb reset issue found while resetting the iDRAC ? Resetting the iDRAC is an out of band process and has to be issued via a separate management network to the iDRAC. I found the time to test this issue in several OS-Hardware combinations: R210 Squeeze - hangs R210 Ubuntu 10.04 - recovers (to my surprise) R210 II Squeeze - hangs R210 II Ubuntu 10.04 - hangs R210 II CentOS 6.1 - hangs (expected, just tried that to be sure) R710 Squeeze - not effected Sven We have observed that doing a reset on idrac on low-end server like R|T210 R|T310 triggers the panic whereas the high end servers do not deadlock on an iDRAC reset so we know that this timing dependent. Ben - I had attached the patch to the earlier thread. Let me know if you need any additional work from me on this. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/dbfb1b45af80394abd1c807e9f28d15707bc8f1...@blrx7mcdc203.amer.dell.com
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
On Tue, May 01, 2012 at 10:15:37AM +0530, shyam_i...@dell.com wrote: Hi, Was the usb reset issue found while resetting the iDRAC ? Resetting the iDRAC is an out of band process and has to be issued via a separate management network to the iDRAC. I found the time to test this issue in several OS-Hardware combinations: R210 Squeeze - hangs R210 Ubuntu 10.04 - recovers (to my surprise) R210 II Squeeze - hangs R210 II Ubuntu 10.04 - hangs R210 II CentOS 6.1 - hangs (expected, just tried that to be sure) R710 Squeeze - not effected Sven -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120515094541.ga3...@sho.bk.hosteurope.de
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
On Tue, May 01, 2012 at 10:15:37AM +0530, shyam_i...@dell.com wrote: Hi, It doesn't seem like this is the same bug. Was the usb reset issue found while resetting the iDRAC ? Ok we just tried a 'racadm racreset hard' on a R210 and yes we can reproduce that issue. We would highly appreciate it to get the fix for that issue included in the Debian/squeeze kernel aswell. Maybe it would fix our original issue aswell but that needs to be tested. I've just requested some test hardware at my workplace to conduct further testing. Sven -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120502103151.ga18...@sho.bk.hosteurope.de
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
On Tue, May 01, 2012 at 10:15:37AM +0530, shyam_i...@dell.com wrote: Hi, It doesn't seem like this is the same bug. Was the usb reset issue found while resetting the iDRAC ? No, during normal operation. I think nobody even used the iDRAC of those systems between the last boot and the appearance of this issue. While we had to use 'racreset hard' rather frequently with the old DRAC 4 cards I can't really remember we had to use it with the current iDRAC cards in R210 and R210-II based systems at all. To me it still looks like this could be a symptomatic log of this bug BZ#772884 On large SMP systems, the TSC (Time Stamp Counter) clock frequency could be incorrectly calculated. The discrepancy between the correct value and the incorrect value was within 0.5%. When the system rebooted, this small error would result in the system becoming out of synchronization with an external reference clock (typically a NTP server). With this update, the TSC frequency calculation has been improved and the clock correctly maintains synchronization with external reference clocks. I'm not sure what counts as 'large SMP system' here. The systems we see this mostly on are R210 with an Intel X3430 CPU. Last week we had a first appearance of this issue on a R210-II system equipped with a E3-1220 CPU. They're all quad core single socket systems. We're are using ntpd in the default installation, so it should've been involved on all systems. Sven -- And I don't know much, but I do know this: With a golden heart comes a rebel fist. [ Streetlight Manifesto - Here's To Life ] -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120501083747.GA2990@marvin
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
I'm handling a bug report to Debian by Sven Hoexter (cc'd) involving lockups on Dell hardware, which seem to involve USB HID reset. The bug report is at http://bugs.debian.org/670398. I found that Red Hat recently made a bug fix credited to you (or your namesake - tell me if I have the wrong person!) described as: BZ#797205 Due to a bug in the hid_reset() function, a deadlock could occur when a Dell iDRAC controller was reset. Consequently, its USB keyboard or mouse device became unresponsive. A patch that fixes the underlying code has been provided to address this bug and the hangs no longer occur in the described scenario. Do the symptoms that Sven found match your understanding of the bug? Ben. (I was also able to extract a patch by comparing the Red Hat packages: --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -469,10 +469,8 @@ * talking to TTs must queue control transfers (not just bulk and iso), so * both can talk to the same hub concurrently. */ -static void hub_tt_work(struct work_struct *work) +void _hub_tt_work(struct usb_hub *hub) { - struct usb_hub *hub = - container_of(work, struct usb_hub, tt.clear_work); unsigned long flags; int limit = 100; @@ -507,6 +505,14 @@ spin_unlock_irqrestore (hub-tt.lock, flags); } +void hub_tt_work(struct work_struct *work) +{ + struct usb_hub *hub = + container_of(work, struct usb_hub, tt.clear_work); + + _hub_tt_work(hub); +} + /** * usb_hub_clear_tt_buffer - clear control/bulk TT state in high speed hub * @urb: an URB associated with the failed or incomplete split transaction @@ -554,7 +560,20 @@ /* tell keventd to clear state for this TT */ spin_lock_irqsave (tt-lock, flags); list_add_tail (clear-clear_list, tt-clear_list); - schedule_work(tt-clear_work); + /* don't schedule on kevent if we're running on keventd (e.g., +* in hid_reset we can get here on kevent) unless on =2.6.36 +*/ + if (!(current-flags PF_KTHREAD) || !strstr(current-comm, events)) + /* put it on keventd */ + schedule_work(tt-clear_work); + else { + /* let khubd do it */ + struct usb_hub *hub = + container_of(tt-clear_work, struct usb_hub, + tt.clear_work); + kick_khubd(hub); + } + spin_unlock_irqrestore (tt-lock, flags); return 0; } @@ -3421,6 +3440,10 @@ if (hub-quiescing) goto loop_autopm; + /* _hub_tt_work usually run on keventd */ + if (!list_empty(hub-tt.clear_list)) + _hub_tt_work(hub); + if (hub-error) { dev_dbg (hub_dev, resetting for error %d\n, hub-error); -- END --- but if you would like to provide a version with your own description and sign-off, I would be grateful for it.) -- Ben Hutchings Design a system any fool can use, and only a fool will want to use it. signature.asc Description: This is a digitally signed message part
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
I'm handling a bug report to Debian by Sven Hoexter (cc'd) involving lockups on Dell hardware, which seem to involve USB HID reset. The bug report is at http://bugs.debian.org/670398. I found that Red Hat recently made a bug fix credited to you, described as: BZ#797205 Due to a bug in the hid_reset() function, a deadlock could occur when a Dell iDRAC controller was reset. Consequently, its USB keyboard or mouse device became unresponsive. A patch that fixes the underlying code has been provided to address this bug and the hangs no longer occur in the described scenario. Do the symptoms that Sven found match your understanding of the bug? Ben. (I was also able to extract a patch by comparing the Red Hat packages: --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -469,10 +469,8 @@ * talking to TTs must queue control transfers (not just bulk and iso), so * both can talk to the same hub concurrently. */ -static void hub_tt_work(struct work_struct *work) +void _hub_tt_work(struct usb_hub *hub) { - struct usb_hub *hub = - container_of(work, struct usb_hub, tt.clear_work); unsigned long flags; int limit = 100; @@ -507,6 +505,14 @@ spin_unlock_irqrestore (hub-tt.lock, flags); } +void hub_tt_work(struct work_struct *work) +{ + struct usb_hub *hub = + container_of(work, struct usb_hub, tt.clear_work); + + _hub_tt_work(hub); +} + /** * usb_hub_clear_tt_buffer - clear control/bulk TT state in high speed hub * @urb: an URB associated with the failed or incomplete split transaction @@ -554,7 +560,20 @@ /* tell keventd to clear state for this TT */ spin_lock_irqsave (tt-lock, flags); list_add_tail (clear-clear_list, tt-clear_list); - schedule_work(tt-clear_work); + /* don't schedule on kevent if we're running on keventd (e.g., +* in hid_reset we can get here on kevent) unless on =2.6.36 +*/ + if (!(current-flags PF_KTHREAD) || !strstr(current-comm, events)) + /* put it on keventd */ + schedule_work(tt-clear_work); + else { + /* let khubd do it */ + struct usb_hub *hub = + container_of(tt-clear_work, struct usb_hub, + tt.clear_work); + kick_khubd(hub); + } + spin_unlock_irqrestore (tt-lock, flags); return 0; } @@ -3421,6 +3440,10 @@ if (hub-quiescing) goto loop_autopm; + /* _hub_tt_work usually run on keventd */ + if (!list_empty(hub-tt.clear_list)) + _hub_tt_work(hub); + if (hub-error) { dev_dbg (hub_dev, resetting for error %d\n, hub-error); -- END --- but if you can provide a version with your own description and sign-off, I would be grateful for it.) -- Ben Hutchings Design a system any fool can use, and only a fool will want to use it. signature.asc Description: This is a digitally signed message part
Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset
-Original Message- From: Ben Hutchings [mailto:b...@decadent.org.uk] Sent: Monday, April 30, 2012 11:15 PM To: Iyer, Shyam Cc: Sven Hoexter; 670...@bugs.debian.org Subject: Deadlock in hid_reset when Dell iDRAC is reset I'm handling a bug report to Debian by Sven Hoexter (cc'd) involving lockups on Dell hardware, which seem to involve USB HID reset. The bug report is at http://bugs.debian.org/670398. It doesn't seem like this is the same bug. Was the usb reset issue found while resetting the iDRAC ? Resetting the iDRAC is an out of band process and has to be issued via a separate management network to the iDRAC. Besides this is not an upstream bug and hence was tailor made for RHEL6. To me it still looks like this could be a symptomatic log of this bug BZ#772884 On large SMP systems, the TSC (Time Stamp Counter) clock frequency could be incorrectly calculated. The discrepancy between the correct value and the incorrect value was within 0.5%. When the system rebooted, this small error would result in the system becoming out of synchronization with an external reference clock (typically a NTP server). With this update, the TSC frequency calculation has been improved and the clock correctly maintains synchronization with external reference clocks. I found that Red Hat recently made a bug fix credited to you, described as: BZ#797205 Due to a bug in the hid_reset() function, a deadlock could occur when a Dell iDRAC controller was reset. Consequently, its USB keyboard or mouse device became unresponsive. A patch that fixes the underlying code has been provided to address this bug and the hangs no longer occur in the described scenario. Do the symptoms that Sven found match your understanding of the bug? Ben. (I was also able to extract a patch by comparing the Red Hat packages: --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -469,10 +469,8 @@ * talking to TTs must queue control transfers (not just bulk and iso), so * both can talk to the same hub concurrently. */ -static void hub_tt_work(struct work_struct *work) +void _hub_tt_work(struct usb_hub *hub) { - struct usb_hub *hub = - container_of(work, struct usb_hub, tt.clear_work); unsigned long flags; int limit = 100; @@ -507,6 +505,14 @@ spin_unlock_irqrestore (hub-tt.lock, flags); } +void hub_tt_work(struct work_struct *work) { + struct usb_hub *hub = + container_of(work, struct usb_hub, tt.clear_work); + + _hub_tt_work(hub); +} + /** * usb_hub_clear_tt_buffer - clear control/bulk TT state in high speed hub * @urb: an URB associated with the failed or incomplete split transaction @@ -554,7 +560,20 @@ /* tell keventd to clear state for this TT */ spin_lock_irqsave (tt-lock, flags); list_add_tail (clear-clear_list, tt-clear_list); - schedule_work(tt-clear_work); + /* don't schedule on kevent if we're running on keventd (e.g., + * in hid_reset we can get here on kevent) unless on =2.6.36 + */ + if (!(current-flags PF_KTHREAD) || !strstr(current-comm, events)) + /* put it on keventd */ + schedule_work(tt-clear_work); + else { + /* let khubd do it */ + struct usb_hub *hub = + container_of(tt-clear_work, struct usb_hub, + tt.clear_work); + kick_khubd(hub); + } + spin_unlock_irqrestore (tt-lock, flags); return 0; } @@ -3421,6 +3440,10 @@ if (hub-quiescing) goto loop_autopm; + /* _hub_tt_work usually run on keventd */ + if (!list_empty(hub-tt.clear_list)) + _hub_tt_work(hub); + if (hub-error) { dev_dbg (hub_dev, resetting for error %d\n, hub-error); -- END --- but if you can provide a version with your own description and sign- off, I would be grateful for it.) Sure(see attached). But like I said you might need instead the fix for the above bz I referenced. Thanks, Shyam -- Ben Hutchings Design a system any fool can use, and only a fool will want to use it. 0001-usb-Fix-deadlock-in-hid_reset-when-Dell-iDRAC.patch Description: 0001-usb-Fix-deadlock-in-hid_reset-when-Dell-iDRAC.patch