Re: Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2014-08-20 Thread Andrea Henderson


Waiting on the day I go home !


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/818a7978-671c-4db5-a166-d3d3dc23a...@gmail.com



Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-07-16 Thread Sven Hoexter
On Sun, Jul 15, 2012 at 11:41:33PM +0100, Ben Hutchings wrote:

Hi,

 I assume you mean this patch:
 http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=65;filename=0001-usb-Fix-deadlock-in-hid_reset-when-Dell-iDRAC.patch;att=1;bug=670398
 so I'll apply that.

Exactly, that would be great.


 It won't be accepted into a 2.6.32.y release unless someone can explain
 how it was fixed upstream (ideally, which commit(s) fixed it).

I think it was somewhere mentioned that it got fixed with some USB-HID rewrite
in 2.6.36 or 2.6.37. We could not reproduce it with Linux 3.2 from backports
and internal builds of 2.6.37. But I can see that this isn't a proper
explanation or reason for an inclusion upstream.

Sven


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120716142519.gc23...@sho.bk.hosteurope.de



Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-07-16 Thread Stuart_Hayes
 
 On Sun, Jul 15, 2012 at 11:41:33PM +0100, Ben Hutchings wrote:
 
 Hi,
 
  I assume you mean this patch:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=65;filename=0001-
 usb-Fix-deadlock-in-hid_reset-when-Dell-iDRAC.patch;att=1;bug=670398
  so I'll apply that.
 
 Exactly, that would be great.
 
 
  It won't be accepted into a 2.6.32.y release unless someone can
 explain
  how it was fixed upstream (ideally, which commit(s) fixed it).
 
 I think it was somewhere mentioned that it got fixed with some USB-HID
 rewrite
 in 2.6.36 or 2.6.37. We could not reproduce it with Linux 3.2 from
 backports
 and internal builds of 2.6.37. But I can see that this isn't a proper
 explanation or reason for an inclusion upstream.
 
 Sven
 

I believe this was fixed with changes to the kernel workqueue code (in 2.6.36, 
I believe).

In older kernels, the kernel workqueue would run one queued item at a time, and 
wait for it to finish before running the next one.  The function hid_reset() is 
running on keventd (the workqueue).  When a transaction run by hid_reset() gets 
a timeout trying to talk to the iDRAC, it puts hub_tt_work() on the workqueue 
to be run by keventd (by calling schedule_work(tt-clear_work)) and waits for 
it to finish.  But keventd is waiting for hid_reset() to finish before it will 
run hub_tt_work().  So deadlock.

Later kernels don't wait for each item on the workqueue to finish before 
starting the next, as I recall.

Stuart


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/959d45574d89af41a9dadf6f446a2e9a2aef6f1...@ausx7mcps310.amer.dell.com



Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-07-15 Thread Ben Hutchings
On Tue, 2012-05-29 at 09:42 +0200, Sven Hoexter wrote:
 On Mon, May 21, 2012 at 10:25:09PM +0530, shyam_i...@dell.com wrote:
 
 Hi,
 
  We have observed that doing a reset on idrac on low-end server like R|T210 
  R|T310 triggers
  the panic whereas the high end servers do not deadlock on an iDRAC reset so 
  we know that
  this timing dependent.
 
 Ah thanks that matches our observations.
 
 
  Ben - I had attached the patch to the earlier thread. Let me know if you 
  need any
  additional work from me on this.
 
 We've now applied that patch to the latest Debian Squeeze Kernel release and 
 indeed
 fixes the 'racreset' issue.

 Ben, is there a chance to get that one included in the Debian Kernel or even 
 better
 in a 2.6.32.x release upstream? Since we see the same issue with Ubuntu 10.04 
 I've
 to open a bugreport with them aswell.

I assume you mean this patch:
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=65;filename=0001-usb-Fix-deadlock-in-hid_reset-when-Dell-iDRAC.patch;att=1;bug=670398
so I'll apply that.

It won't be accepted into a 2.6.32.y release unless someone can explain
how it was fixed upstream (ideally, which commit(s) fixed it).

Ben.

-- 
Ben Hutchings
Beware of programmers who carry screwdrivers. - Leonard Brandwein


signature.asc
Description: This is a digitally signed message part


Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-07-05 Thread [Adam Wilkosz]

Hello,

we have the same problem with all our DELL R210 and R210 II and Debian 
Squeeze.


Jul  5 09:44:51 da16 kernel: [10760029.449586] usb 1-1.1: reset full 
speed USB device using ehci_hcd and address 3
Jul  5 09:45:06 da16 kernel: [10760044.513750] usb 1-1.1: device 
descriptor read/64, error -110
Jul  5 09:45:21 da16 kernel: [10760059.680488] usb 1-1.1: device 
descriptor read/64, error -110


and then no keyboard after ssh login and all processes hangs.
I'm doing reboot with ssh root@host reboot and for few months or more 
everything is ok.


Nobody is logged in DRAC and we don't have any Dell apps for DRAC 
control running. It occurs randomly.


Kernel: linux-image-2.6.32-5-amd64 2.6.32-45

Can you tell us when it will be fixed upstream?

Regards,
Adam



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4ff55617.8010...@domeny.pl



Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-05-29 Thread Sven Hoexter
On Mon, May 21, 2012 at 10:25:09PM +0530, shyam_i...@dell.com wrote:

Hi,

 We have observed that doing a reset on idrac on low-end server like R|T210 
 R|T310 triggers
 the panic whereas the high end servers do not deadlock on an iDRAC reset so 
 we know that
 this timing dependent.

Ah thanks that matches our observations.


 Ben - I had attached the patch to the earlier thread. Let me know if you need 
 any
 additional work from me on this.

We've now applied that patch to the latest Debian Squeeze Kernel release and 
indeed
fixes the 'racreset' issue.

Ben, is there a chance to get that one included in the Debian Kernel or even 
better
in a 2.6.32.x release upstream? Since we see the same issue with Ubuntu 10.04 
I've
to open a bugreport with them aswell.

Sven 



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120529074240.ga2...@sho.bk.hosteurope.de



Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-05-22 Thread Jonathan Nieder
clone 670398 -1
retitle -1 Deadlock in hid_reset when Dell iDRAC is reset
tags -1 + upstream patch moreinfo
quit

shyam_i...@dell.com wrote:

 It doesn't seem like this is the same bug.

Indeed.  Cloning as a reminder to raise this upstream.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120522214317.GA22796@burratino



Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-05-21 Thread Shyam_Iyer


 -Original Message-
 From: Sven Hoexter [mailto:s...@timegate.de]
 Sent: Tuesday, May 15, 2012 5:46 AM
 To: Iyer, Shyam
 Cc: b...@decadent.org.uk; s...@timegate.de; 670...@bugs.debian.org;
 Hayes, Stuart
 Subject: Re: Deadlock in hid_reset when Dell iDRAC is reset
 
 On Tue, May 01, 2012 at 10:15:37AM +0530, shyam_i...@dell.com wrote:
 
 Hi,
 
  Was the usb reset issue found while resetting the iDRAC ?
 
  Resetting the iDRAC is an out of band process and has to be issued
 via a separate management network to the iDRAC.
 
 I found the time to test this issue in several OS-Hardware
 combinations:
 
 R210 Squeeze - hangs
 R210 Ubuntu 10.04 - recovers (to my surprise)
 
 R210 II Squeeze - hangs
 R210 II Ubuntu 10.04 - hangs
 R210 II CentOS 6.1 - hangs (expected, just tried that to be sure)
 
 R710 Squeeze - not effected
 
 Sven
 

We have observed that doing a reset on idrac on low-end server like R|T210 
R|T310 triggers the panic whereas the high end servers do not deadlock on an 
iDRAC reset so we know that this timing dependent.

Ben - I had attached the patch to the earlier thread. Let me know if you need 
any additional work from me on this.




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/dbfb1b45af80394abd1c807e9f28d15707bc8f1...@blrx7mcdc203.amer.dell.com



Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-05-15 Thread Sven Hoexter
On Tue, May 01, 2012 at 10:15:37AM +0530, shyam_i...@dell.com wrote:

Hi,

 Was the usb reset issue found while resetting the iDRAC ?
 
 Resetting the iDRAC is an out of band process and has to be issued via a 
 separate management network to the iDRAC.

I found the time to test this issue in several OS-Hardware combinations:

R210 Squeeze - hangs
R210 Ubuntu 10.04 - recovers (to my surprise)

R210 II Squeeze - hangs
R210 II Ubuntu 10.04 - hangs
R210 II CentOS 6.1 - hangs (expected, just tried that to be sure)

R710 Squeeze - not effected

Sven





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120515094541.ga3...@sho.bk.hosteurope.de



Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-05-02 Thread Sven Hoexter
On Tue, May 01, 2012 at 10:15:37AM +0530, shyam_i...@dell.com wrote:

Hi,

 It doesn't seem like this is the same bug.
 
 Was the usb reset issue found while resetting the iDRAC ?

Ok we just tried a 'racadm racreset hard' on a R210 and yes
we can reproduce that issue.

We would highly appreciate it to get the fix for that
issue included in the Debian/squeeze kernel aswell.
Maybe it would fix our original issue aswell but that needs
to be tested.

I've just requested some test hardware at my workplace to
conduct further testing.

Sven






-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120502103151.ga18...@sho.bk.hosteurope.de



Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-05-01 Thread Sven Hoexter
On Tue, May 01, 2012 at 10:15:37AM +0530, shyam_i...@dell.com wrote:

Hi,

 It doesn't seem like this is the same bug.
 
 Was the usb reset issue found while resetting the iDRAC ?

No, during normal operation. I think nobody even used the
iDRAC of those systems between the last boot and the appearance
of this issue.
While we had to use 'racreset hard' rather frequently with the old
DRAC 4 cards I can't really remember we had to use it with the
current iDRAC cards in R210 and R210-II based systems at all.



 To me it still looks like this could be a symptomatic log of this bug 
 
 BZ#772884
 On large SMP systems, the TSC (Time Stamp Counter) clock frequency could 
 be incorrectly calculated. The discrepancy between the correct value and the 
 incorrect value was within 0.5%. When the system rebooted, this small error 
 would result in the system becoming out of synchronization with an external 
 reference clock (typically a NTP server). With this update, the TSC frequency 
 calculation has been improved and the clock correctly maintains 
 synchronization with external reference clocks.


I'm not sure what counts as 'large SMP system' here. The systems we
see this mostly on are R210 with an Intel X3430 CPU. Last week we
had a first appearance of this issue on a R210-II system equipped
with a E3-1220 CPU. They're all quad core single socket systems.

We're are using ntpd in the default installation, so it should've been
involved on all systems.

Sven
-- 
And I don't know much, but I do know this:
With a golden heart comes a rebel fist.
 [ Streetlight Manifesto - Here's To Life ]



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120501083747.GA2990@marvin



Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-04-30 Thread Ben Hutchings
I'm handling a bug report to Debian by Sven Hoexter (cc'd) involving
lockups on Dell hardware, which seem to involve USB HID reset.  The bug
report is at http://bugs.debian.org/670398.

I found that Red Hat recently made a bug fix credited to you (or your
namesake - tell me if I have the wrong person!) described as:

BZ#797205
Due to a bug in the hid_reset() function, a deadlock could occur
when a Dell iDRAC controller was reset. Consequently, its USB
keyboard or mouse device became unresponsive. A patch that fixes
the underlying code has been provided to address this bug and
the hangs no longer occur in the described scenario.

Do the symptoms that Sven found match your understanding of the bug?

Ben.

(I was also able to extract a patch by comparing the Red Hat packages:

--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -469,10 +469,8 @@
  * talking to TTs must queue control transfers (not just bulk and iso), so
  * both can talk to the same hub concurrently.
  */
-static void hub_tt_work(struct work_struct *work)
+void _hub_tt_work(struct usb_hub *hub)
 {
-   struct usb_hub  *hub =
-   container_of(work, struct usb_hub, tt.clear_work);
unsigned long   flags;
int limit = 100;
 
@@ -507,6 +505,14 @@
spin_unlock_irqrestore (hub-tt.lock, flags);
 }
 
+void hub_tt_work(struct work_struct *work)
+{
+   struct usb_hub  *hub =
+   container_of(work, struct usb_hub, tt.clear_work);
+
+   _hub_tt_work(hub);
+}
+
 /**
  * usb_hub_clear_tt_buffer - clear control/bulk TT state in high speed hub
  * @urb: an URB associated with the failed or incomplete split transaction
@@ -554,7 +560,20 @@
/* tell keventd to clear state for this TT */
spin_lock_irqsave (tt-lock, flags);
list_add_tail (clear-clear_list, tt-clear_list);
-   schedule_work(tt-clear_work);
+   /* don't schedule on kevent if we're running on keventd (e.g.,
+* in hid_reset we can get here on kevent) unless on =2.6.36
+*/
+   if (!(current-flags  PF_KTHREAD) || !strstr(current-comm, events))
+   /* put it on keventd */
+   schedule_work(tt-clear_work);
+   else {
+   /* let khubd do it */
+   struct usb_hub  *hub =
+   container_of(tt-clear_work, struct usb_hub,
+   tt.clear_work);
+   kick_khubd(hub);
+   }
+
spin_unlock_irqrestore (tt-lock, flags);
return 0;
 }
@@ -3421,6 +3440,10 @@
if (hub-quiescing)
goto loop_autopm;
 
+   /* _hub_tt_work usually run on keventd */
+   if (!list_empty(hub-tt.clear_list))
+   _hub_tt_work(hub);
+
if (hub-error) {
dev_dbg (hub_dev, resetting for error %d\n,
hub-error);
-- END ---

but if you would like to provide a version with your own description and
sign-off, I would be grateful for it.)

-- 
Ben Hutchings
Design a system any fool can use, and only a fool will want to use it.


signature.asc
Description: This is a digitally signed message part


Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-04-30 Thread Ben Hutchings
I'm handling a bug report to Debian by Sven Hoexter (cc'd) involving
lockups on Dell hardware, which seem to involve USB HID reset.  The bug
report is at http://bugs.debian.org/670398.

I found that Red Hat recently made a bug fix credited to you, described as:

BZ#797205
Due to a bug in the hid_reset() function, a deadlock could occur
when a Dell iDRAC controller was reset. Consequently, its USB
keyboard or mouse device became unresponsive. A patch that fixes
the underlying code has been provided to address this bug and
the hangs no longer occur in the described scenario.

Do the symptoms that Sven found match your understanding of the bug?

Ben.

(I was also able to extract a patch by comparing the Red Hat packages:

--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -469,10 +469,8 @@
  * talking to TTs must queue control transfers (not just bulk and iso), so
  * both can talk to the same hub concurrently.
  */
-static void hub_tt_work(struct work_struct *work)
+void _hub_tt_work(struct usb_hub *hub)
 {
-   struct usb_hub  *hub =
-   container_of(work, struct usb_hub, tt.clear_work);
unsigned long   flags;
int limit = 100;
 
@@ -507,6 +505,14 @@
spin_unlock_irqrestore (hub-tt.lock, flags);
 }
 
+void hub_tt_work(struct work_struct *work)
+{
+   struct usb_hub  *hub =
+   container_of(work, struct usb_hub, tt.clear_work);
+
+   _hub_tt_work(hub);
+}
+
 /**
  * usb_hub_clear_tt_buffer - clear control/bulk TT state in high speed hub
  * @urb: an URB associated with the failed or incomplete split transaction
@@ -554,7 +560,20 @@
/* tell keventd to clear state for this TT */
spin_lock_irqsave (tt-lock, flags);
list_add_tail (clear-clear_list, tt-clear_list);
-   schedule_work(tt-clear_work);
+   /* don't schedule on kevent if we're running on keventd (e.g.,
+* in hid_reset we can get here on kevent) unless on =2.6.36
+*/
+   if (!(current-flags  PF_KTHREAD) || !strstr(current-comm, events))
+   /* put it on keventd */
+   schedule_work(tt-clear_work);
+   else {
+   /* let khubd do it */
+   struct usb_hub  *hub =
+   container_of(tt-clear_work, struct usb_hub,
+   tt.clear_work);
+   kick_khubd(hub);
+   }
+
spin_unlock_irqrestore (tt-lock, flags);
return 0;
 }
@@ -3421,6 +3440,10 @@
if (hub-quiescing)
goto loop_autopm;
 
+   /* _hub_tt_work usually run on keventd */
+   if (!list_empty(hub-tt.clear_list))
+   _hub_tt_work(hub);
+
if (hub-error) {
dev_dbg (hub_dev, resetting for error %d\n,
hub-error);
-- END ---

but if you can provide a version with your own description and sign-off,
I would be grateful for it.)

-- 
Ben Hutchings
Design a system any fool can use, and only a fool will want to use it.



signature.asc
Description: This is a digitally signed message part


Bug#670398: Deadlock in hid_reset when Dell iDRAC is reset

2012-04-30 Thread Shyam_Iyer
 -Original Message-
 From: Ben Hutchings [mailto:b...@decadent.org.uk]
 Sent: Monday, April 30, 2012 11:15 PM
 To: Iyer, Shyam
 Cc: Sven Hoexter; 670...@bugs.debian.org
 Subject: Deadlock in hid_reset when Dell iDRAC is reset
 
 I'm handling a bug report to Debian by Sven Hoexter (cc'd) involving
 lockups on Dell hardware, which seem to involve USB HID reset.  The bug
 report is at http://bugs.debian.org/670398.

It doesn't seem like this is the same bug.

Was the usb reset issue found while resetting the iDRAC ?

Resetting the iDRAC is an out of band process and has to be issued via a 
separate management network to the iDRAC.

Besides this is not an upstream bug and hence was tailor made for RHEL6.

To me it still looks like this could be a symptomatic log of this bug 

BZ#772884
On large SMP systems, the TSC (Time Stamp Counter) clock frequency could be 
incorrectly calculated. The discrepancy between the correct value and the 
incorrect value was within 0.5%. When the system rebooted, this small error 
would result in the system becoming out of synchronization with an external 
reference clock (typically a NTP server). With this update, the TSC frequency 
calculation has been improved and the clock correctly maintains synchronization 
with external reference clocks.


 
 I found that Red Hat recently made a bug fix credited to you, described
 as:
 
 BZ#797205
 Due to a bug in the hid_reset() function, a deadlock could
 occur
 when a Dell iDRAC controller was reset. Consequently, its USB
 keyboard or mouse device became unresponsive. A patch that
 fixes
 the underlying code has been provided to address this bug and
 the hangs no longer occur in the described scenario.
 
 Do the symptoms that Sven found match your understanding of the bug?
 
 Ben.
 
 (I was also able to extract a patch by comparing the Red Hat packages:
 
 --- a/drivers/usb/core/hub.c
 +++ b/drivers/usb/core/hub.c
 @@ -469,10 +469,8 @@
   * talking to TTs must queue control transfers (not just bulk and
 iso), so
   * both can talk to the same hub concurrently.
   */
 -static void hub_tt_work(struct work_struct *work)
 +void _hub_tt_work(struct usb_hub *hub)
  {
 - struct usb_hub  *hub =
 - container_of(work, struct usb_hub, tt.clear_work);
   unsigned long   flags;
   int limit = 100;
 
 @@ -507,6 +505,14 @@
   spin_unlock_irqrestore (hub-tt.lock, flags);  }
 
 +void hub_tt_work(struct work_struct *work) {
 + struct usb_hub  *hub =
 + container_of(work, struct usb_hub, tt.clear_work);
 +
 + _hub_tt_work(hub);
 +}
 +
  /**
   * usb_hub_clear_tt_buffer - clear control/bulk TT state in high speed
 hub
   * @urb: an URB associated with the failed or incomplete split
 transaction @@ -554,7 +560,20 @@
   /* tell keventd to clear state for this TT */
   spin_lock_irqsave (tt-lock, flags);
   list_add_tail (clear-clear_list, tt-clear_list);
 - schedule_work(tt-clear_work);
 + /* don't schedule on kevent if we're running on keventd (e.g.,
 +  * in hid_reset we can get here on kevent) unless on =2.6.36
 +  */
 + if (!(current-flags  PF_KTHREAD) || !strstr(current-comm,
 events))
 + /* put it on keventd */
 + schedule_work(tt-clear_work);
 + else {
 + /* let khubd do it */
 + struct usb_hub  *hub =
 + container_of(tt-clear_work, struct usb_hub,
 + tt.clear_work);
 + kick_khubd(hub);
 + }
 +
   spin_unlock_irqrestore (tt-lock, flags);
   return 0;
  }
 @@ -3421,6 +3440,10 @@
   if (hub-quiescing)
   goto loop_autopm;
 
 + /* _hub_tt_work usually run on keventd */
 + if (!list_empty(hub-tt.clear_list))
 + _hub_tt_work(hub);
 +
   if (hub-error) {
   dev_dbg (hub_dev, resetting for error %d\n,
   hub-error);
 -- END ---
 
 but if you can provide a version with your own description and sign-
 off, I would be grateful for it.)

Sure(see attached). 

But like I said you might need instead the fix for the above bz I referenced.

Thanks,
Shyam


 
 --
 Ben Hutchings
 Design a system any fool can use, and only a fool will want to use it.



0001-usb-Fix-deadlock-in-hid_reset-when-Dell-iDRAC.patch
Description: 0001-usb-Fix-deadlock-in-hid_reset-when-Dell-iDRAC.patch