Re: [linux-usb-devel] More USB stack hangs after sporadic disconnect
David Brownell wrote: > On Thursday 02 February 2006 7:05 am, Alan Stern wrote: >> [Added David Brownell to the CC: since his patch evidently doesn't fix >> the ehci_endpoint_disable hang.] > > No, there still seems to be some issue where the IAA watchdog timer is > not having the intended effect. It would help if someone who's seeing > this hang would put the time into debugging that. > I would have no problem with trying patches and reporting back results / kernel dumps. But I unfortunately have no free brain cycles to spend on this, i.e. defining my own experiments and understanding the code. Jan signature.asc Description: OpenPGP digital signature
Re: [linux-usb-devel] USB stack hangs after sporadic disconnect
Alan Stern wrote: > On Tue, 15 Nov 2005, Jan Kiszka wrote: > > >>Trace attached. One ehci_endpoint_disable call actually does not return. >> >>I furthermore had my box running without ehci for about 8 hours today, >>and no disconnect occured. >> >>Next step? > > > Unfortunately the stack trace doesn't mention khubd. Undoubtedly the > trace overran the kernel's log buffer. You could try reconfiguring the > kernel to increase the log-buffer size. > You mean something like this: khubd D 003D446F 0 1749 6 5752 692 (L-TLB) ccfade98 bec52600 003d446f ccefd688 ccefd560 bec52600 003d446f c0316a60 ccfadea8 00f94b77 cdb57800 cdb578e0 c0261e67 ccfadea8 00f94b77 c03383d0 c03383d0 00f94b77 4b87ad6e c011e16d ccefd560 Call Trace: [schedule_timeout+130/163] schedule_timeout+0x82/0xa3 [process_timeout+0/9] process_timeout+0x0/0x9 [pg0+241756895/1070240768] ehci_endpoint_disable+0xa6/0x140 [ehci_hcd] [pg0+242750144/1070240768] usb_disable_device+0x63/0x157 [usbcore] [pg0+242730389/1070240768] usb_disconnect+0xc2/0x161 [usbcore] [pg0+242734186/1070240768] hub_port_connect_change+0xc3/0x3f1 [usbcore] [pg0+242735944/1070240768] hub_events+0x3b0/0x4bf [usbcore] [pg0+242736215/1070240768] hub_thread+0x0/0xdb [usbcore] [pg0+242736224/1070240768] hub_thread+0x9/0xdb [usbcore] [autoremove_wake_function+0/58] autoremove_wake_function+0x0/0x3a [kthread+105/150] kthread+0x69/0x96 [kthread+0/150] kthread+0x0/0x96 [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb Jan signature.asc Description: OpenPGP digital signature
Re: [linux-usb-devel] USB stack hangs after sporadic disconnect
Alan Stern wrote: > On Tue, 15 Nov 2005, Jan Kiszka wrote: > > >>>And get that stack trace. It's the best way to find the exact location of >>>the problem. >>> >> >>Will the trace I request typically hours after the disconnect (I can't >>sit aside my box all the time...) still contain valuable information? Or >>should I better place some trace-triggering command at a prominent place? > > > If things do hang up the way you described, then the trace will still be > useful even hours later. > Trace attached. One ehci_endpoint_disable call actually does not return. I furthermore had my box running without ehci for about 8 hours today, and no disconnect occured. Next step? Jan Nov 15 19:27:29 nibbler -- MARK -- Nov 15 19:47:30 nibbler -- MARK -- Nov 15 20:07:30 nibbler -- MARK -- Nov 15 20:07:39 nibbler kernel: hub 4-0:1.0: state 5 ports 6 chg evt 0008 Nov 15 20:07:39 nibbler kernel: ehci_hcd :00:10.3: GetStatus port 3 status 00180b POWER sig=j PEC CSC CONNECT Nov 15 20:07:39 nibbler kernel: hub 4-0:1.0: port 3, status 0501, change 0003, 480 Mb/s Nov 15 20:07:39 nibbler kernel: usb 4-3: USB disconnect, address 2 Nov 15 20:07:39 nibbler kernel: usb 4-3: usb_disable_device nuking all URBs Nov 15 20:07:39 nibbler kernel: ehci_endpoint_disable:enter Nov 15 20:07:39 nibbler kernel: ehci_endpoint_disable:exit Nov 15 20:07:39 nibbler kernel: ehci_endpoint_disable:enter Nov 15 20:07:39 nibbler kernel: ehci_endpoint_disable:exit Nov 15 20:07:39 nibbler kernel: ehci_endpoint_disable:enter Nov 15 20:07:39 nibbler kernel: ehci_endpoint_disable:exit Nov 15 20:07:39 nibbler kernel: ehci_hcd :00:10.3: shutdown urb c6b55820 pipe c0008280 ep1in-bulk Nov 15 20:07:39 nibbler kernel: ehci_endpoint_disable:enter Nov 15 20:27:30 nibbler -- MARK -- Nov 15 20:47:30 nibbler -- MARK -- Nov 15 21:07:31 nibbler -- MARK -- Nov 15 21:18:46 nibbler kernel: c7291f5c 00bfa086 c7293f5c c728ff5c 00bfa086 4b87ad6e c011e16d ccd10560 Nov 15 21:18:46 nibbler kernel: Call Trace: Nov 15 21:18:46 nibbler kernel: [schedule_timeout+130/163] schedule_timeout+0x82/0xa3 Nov 15 21:18:46 nibbler kernel: [process_timeout+0/9] process_timeout+0x0/0x9 Nov 15 21:18:46 nibbler kernel: [pg0+257422591/1070355456] svc_recv+0x277/0x411 [sunrpc] Nov 15 21:18:46 nibbler kernel: [default_wake_function+0/18] default_wake_function+0x0/0x12 Nov 15 21:18:46 nibbler kernel: [pg0+257737325/1070355456] nfsd+0xdc/0x2dd [nfsd] Nov 15 21:18:46 nibbler kernel: [pg0+257737105/1070355456] nfsd+0x0/0x2dd [nfsd] Nov 15 21:18:46 nibbler kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb Nov 15 21:18:46 nibbler kernel: nfsd S 003D3369 0 5754 1 5755 5753 (L-TLB) Nov 15 21:18:46 nibbler kernel: c7293f4c cbda1600 003d3369 ccd10158 ccd10030 cbda1600 003d3369 Nov 15 21:18:46 nibbler kernel: c0316ed8 c7293f5c 00bfa086 cd290560 c7292000 c0261e67 Nov 15 21:18:46 nibbler kernel:c7293f5c 00bfa086 c72b5f5c c7291f5c 00bfa086 4b87ad6e c011e16d ccd10030 Nov 15 21:18:46 nibbler kernel: Call Trace: Nov 15 21:18:46 nibbler kernel: [schedule_timeout+130/163] schedule_timeout+0x82/0xa3 Nov 15 21:18:46 nibbler kernel: [process_timeout+0/9] process_timeout+0x0/0x9 Nov 15 21:18:46 nibbler kernel: [pg0+257422591/1070355456] svc_recv+0x277/0x411 [sunrpc] Nov 15 21:18:46 nibbler kernel: [default_wake_function+0/18] default_wake_function+0x0/0x12 Nov 15 21:18:46 nibbler kernel: [pg0+257737325/1070355456] nfsd+0xdc/0x2dd [nfsd] Nov 15 21:18:46 nibbler kernel: [pg0+257737105/1070355456] nfsd+0x0/0x2dd [nfsd] Nov 15 21:18:46 nibbler kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb Nov 15 21:18:46 nibbler kernel: nfsd S 003D3369 0 5755 1 5756 5754 (L-TLB) Nov 15 21:18:46 nibbler kernel: c72b5f4c cbda1600 003d3369 c7239bd8 c7239ab0 cbda1600 003d3369 Nov 15 21:18:46 nibbler kernel: c0316ed8 c72b5f5c 00bfa086 c7bcdb60 c72b4000 c0261e67 Nov 15 21:18:46 nibbler kernel:c72b5f5c 00bfa086 c72b7f5c c7293f5c 00bfa086 4b87ad6e c011e16d c7239ab0 Nov 15 21:18:46 nibbler kernel: Call Trace: Nov 15 21:18:46 nibbler kernel: [schedule_timeout+130/163] schedule_timeout+0x82/0xa3 Nov 15 21:18:46 nibbler kernel: [process_timeout+0/9] process_timeout+0x0/0x9 Nov 15 21:18:46 nibbler kernel: [pg0+257422591/1070355456] svc_recv+0x277/0x411 [sunrpc] Nov 15 21:18:46 nibbler kernel: [default_wake_function+0/18] default_wake_function+0x0/0x12 Nov 15 21:18:46 nibbler kernel: [pg0+257737325/1070355456] nfsd+0xdc/0x2dd [nfsd] Nov 15 21:18:46 nibbler kernel: [pg0+257737105/1070355456] nfsd+0x0/0x2dd [nfsd] Nov 15 21:18:46 nibbler kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb Nov 15 21:18:46 nibbler kernel: nfsd S 003D3369 0 5756 1 5757 5755 (L-TLB) Nov 15 21:
Re: [linux-usb-devel] USB stack hangs after sporadic disconnect
Alan Stern wrote: > On Tue, 15 Nov 2005, Jan Kiszka wrote: > >> I placed some printks in the rt2570's disconnect routine. While I can >> find them in the logs when I physically disconnect the stick, I did not >> get them last night during another sporadic disconnect. So the >> high-level driver seems to be out of the game regarding the stack >> lock-up, doesn't it? >> >> I'm now trying if the situation changes without the ehci being loaded. > > It could be that the problem is in the ehci_endpoint_disable routine in > the EHCI driver; that would prevent the disconnect routine from getting > called. Maybe even the hang is caused by whatever caused the sporadic > disconnect. Try putting printk statements at the entry and exit points of > that routine and see what you get. Ok. > > And get that stack trace. It's the best way to find the exact location of > the problem. > Will the trace I request typically hours after the disconnect (I can't sit aside my box all the time...) still contain valuable information? Or should I better place some trace-triggering command at a prominent place? Jan signature.asc Description: OpenPGP digital signature
Re: [linux-usb-devel] USB stack hangs after sporadic disconnect
Alan Stern wrote: > ... > The hang is almost certainly caused by the device's driver. Apparently > its disconnect method is not returning. Since that routine is called by > the main hub driver and with a lock held, this means that most other major > activities of the USB stack will be blocked as well. > I placed some printks in the rt2570's disconnect routine. While I can find them in the logs when I physically disconnect the stick, I did not get them last night during another sporadic disconnect. So the high-level driver seems to be out of the game regarding the stack lock-up, doesn't it? I'm now trying if the situation changes without the ehci being loaded. Jan signature.asc Description: OpenPGP digital signature
Re: [linux-usb-devel] USB stack hangs after sporadic disconnect
David Brownell wrote: > On Monday 14 November 2005 12:58 pm, Jan Kiszka wrote: > >>David Brownell wrote: >> >>>... >>>EBUSY status code happens in a few cases, and in this path I'd suspect >>>it means the URB is already being returned. Could you try the patch >>>I've attached, to see if it prints anything? >>> >> >>Additional question regarding this check: should this warning never show >>up or just not during that unwanted disconnect? I already have two of >>them in my log without any disconnects. > > > Sounds like you're getting more of them than you should. It's > something that ought to be almost vanishingly rare, which may > suggest something else is going wrong. > > - Dave > These are two of now four dumps of my current 2.6.14.2 kernel: kernel: Badness in qh_link_async at drivers/usb/host/ehci-q.c:787 kernel: [pg0+241856579/1070355456] qh_link_async+0x53/0xb1 [ehci_hcd] kernel: [pg0+241857089/1070355456] submit_async+0x50/0x7f [ehci_hcd] kernel: [pg0+241871022/1070355456] ehci_urb_enqueue+0x49/0x99 [ehci_hcd] kernel: [pg0+242856890/1070355456] hcd_submit_urb+0x15b/0x1ae [usbcore] kernel: [pg0+242860976/1070355456] usb_start_wait_urb+0x53/0x166 [usbcore] kernel: [pg0+242861241/1070355456] usb_start_wait_urb+0x15c/0x166 [usbcore] kernel: [pg0+242860888/1070355456] timeout_kill+0x0/0x5 [usbcore] kernel: [pg0+242861369/1070355456] usb_internal_control_msg+0x76/0x87 [usbcore] kernel: [pg0+242861505/1070355456] usb_control_msg+0x77/0x8c [usbcore] kernel: [pg0+247333016/1070355456] RTUSB_VendorRequest+0xae/0x150 [rt2570] kernel: [pg0+247332221/1070355456] RTUSBMultiReadMAC+0x2a/0x2e [rt2570] kernel: [pg0+247244505/1070355456] PeriodicExec+0x44/0x784 [rt2570] kernel: [deactivate_task+22/36] deactivate_task+0x16/0x24 kernel: [_spin_unlock_irq+6/9] _spin_unlock_irq+0x6/0x9 kernel: [schedule+1150/1234] schedule+0x47e/0x4d2 kernel: [pg0+247235979/1070355456] CMDHandler+0x332/0xe14 [rt2570] kernel: [__wake_up_locked+16/21] __wake_up_locked+0x10/0x15 kernel: [__down+159/213] __down+0x9f/0xd5 kernel: [default_wake_function+0/18] default_wake_function+0x0/0x12 kernel: [pg0+247239968/1070355456] RTUSBCmdThread+0x74/0x8a [rt2570] kernel: [pg0+247239852/1070355456] RTUSBCmdThread+0x0/0x8a [rt2570] kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb kernel: Badness in qh_link_async at drivers/usb/host/ehci-q.c:787 kernel: [pg0+241856579/1070355456] qh_link_async+0x53/0xb1 [ehci_hcd] kernel: [pg0+241857089/1070355456] submit_async+0x50/0x7f [ehci_hcd] kernel: [pg0+241871022/1070355456] ehci_urb_enqueue+0x49/0x99 [ehci_hcd] kernel: [pg0+242856890/1070355456] hcd_submit_urb+0x15b/0x1ae [usbcore] kernel: [pg0+247259468/1070355456] RTUSBKickBulkOut+0x80/0xdd [rt2570] kernel: [pg0+247271181/1070355456] PeerProbeReqAction+0x3d3/0x3de [rt2570] kernel: [deactivate_task+22/36] deactivate_task+0x16/0x24 kernel: [pg0+247241789/1070355456] MlmeHandler+0x104/0x25a [rt2570] kernel: [pg0+247239830/1070355456] MlmeThread+0x71/0x87 [rt2570] kernel: [pg0+247239717/1070355456] MlmeThread+0x0/0x87 [rt2570] kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb Something rather for the rt25xx developers or actually a core issue? Jan signature.asc Description: OpenPGP digital signature
Re: [linux-usb-devel] USB stack hangs after sporadic disconnect
David Brownell wrote: > ... > EBUSY status code happens in a few cases, and in this path I'd suspect > it means the URB is already being returned. Could you try the patch > I've attached, to see if it prints anything? > Additional question regarding this check: should this warning never show up or just not during that unwanted disconnect? I already have two of them in my log without any disconnects. Jan signature.asc Description: OpenPGP digital signature
Re: [linux-usb-devel] USB stack hangs after sporadic disconnect
David Brownell wrote: > On Monday 14 November 2005 9:35 am, Jan Kiszka wrote: > >>ehci_hcd :00:10.3: GetStatus port 3 status 00180b POWER sig=j PEC CSC >>CONNECT > > > Translation: the root hub spontaneously disconnected. There are such > reports, which are as far as I can tell always with VIA hardware. I have > no idea what causes them. > So I'm not alone... :-/ > > The stack hanging is a different issue: > > >>usb 4-3: hcd_unlink_urb c633cc20 fail -16 > > > EBUSY status code happens in a few cases, and in this path I'd suspect > it means the URB is already being returned. Could you try the patch > I've attached, to see if it prints anything? Thanks, patch applied, will report what the next disconnect brings. Jan signature.asc Description: OpenPGP digital signature
Re: [linux-usb-devel] USB stack hangs after sporadic disconnect
Alan Stern wrote: > ... > The hang is almost certainly caused by the device's driver. Apparently > its disconnect method is not returning. Since that routine is called by > the main hub driver and with a lock held, this means that most other major > activities of the USB stack will be blocked as well. Interesting, will have a look at the driver's disconnect routine. Maybe some printks or the driver's debugging stuff can give more hints. > > To get more information when you see these messages in the log and things > are hung up, get a stack trace with Alt-SysRq-T. The entry for khubd is > of particular interest; others may be important also depending on the > driver. > Thanks, noted for the next event. Jan signature.asc Description: OpenPGP digital signature
Re: [linux-usb-devel] USB stack hangs after sporadic disconnect
Greg KH wrote: > On Mon, Nov 14, 2005 at 06:54:17PM +0100, Jan Kiszka wrote: >> Greg KH wrote: >>> On Mon, Nov 14, 2005 at 06:35:22PM +0100, Jan Kiszka wrote: >>>> Hi, >>>> >>>> on a VIA Eden box, I'm getting sporadic USB disconnects after longer >>>> operation times. Typically, there is no USB load when it disconnects. >>>> The kernel output is like this: >>>> >>>> kernel: hub 4-0:1.0: state 5 ports 6 chg evt 0008 >>>> kernel: ehci_hcd :00:10.3: GetStatus port 3 status 00180b POWER >>>> sig=j PEC CSC CONNECT >>>> kernel: hub 4-0:1.0: port 3, status 0501, change 0003, 480 Mb/s >>>> kernel: usb 4-3: USB disconnect, address 2 >>>> kernel: usb 4-3: usb_disable_device nuking all URBs >>>> kernel: ehci_hcd :00:10.3: shutdown urb c633cc20 pipe 8280 ep0in >>>> kernel: usb 4-3: hcd_unlink_urb c633cc20 fail -16 >>>> >>>> Attached are either a rt2570-based USB WLAN stick (via native driver) or >>>> a LinkSys WUSB54G WLAN adapter (via ndiswrapper). Both fail, and in both >>>> cases the USB subsystem locks up in some nasty way (deadlock, rest of >>>> system still alive) so that no resetting/unloading of the involved >>>> drivers is possible. Only a hard reboot helps. I'm watching this effect >>>> for quite a while now, so far with 2.6.8 kernels, now also with a recent >>>> 2.6.14.2. >>> If you are using ndiswrapper, there's nothing we can do to help you, >>> sorry. >> We do not need to raise a discussion about this topic here. > > That's fine, just letting you know. > >> I only mentioned ndiswrapper and the WUSB54G to express that this issues >> does not seem to be high-level driver related. I always thought it would >> be a ndiswrapper or LinkSys Windows driver issue until I bought this >> second device. >> >>> What driver controls the rt2570-based USB WLAN device, usbnet? >> http://sourceforge.net/projects/rt2400, the stable rt2570 driver > > I suggest you ask the authors of this driver, not much we can do here > either :( > > Now if you have a problem with an in-kernel driver, then we can help. > Well, I must say that this degree of cooperation from an Open Source project is rather new to me. Did I violate your policy by posting to devel directly? Than sorry, just tell me. All I am asking for are some hints what may go wrong in my case, what the messages e.g. mean. If you are not /able/ to debug this problem in theory (likely I will have to do this on my hardware in real anyway), then please explain why you can exclude a bug in the USB stack. Otherwise, with my currently limited detail knowledge of the involved components, I may easily end up being bumped between both projects. Jan signature.asc Description: OpenPGP digital signature
Re: [linux-usb-devel] USB stack hangs after sporadic disconnect
Greg KH wrote: > On Mon, Nov 14, 2005 at 06:35:22PM +0100, Jan Kiszka wrote: >> Hi, >> >> on a VIA Eden box, I'm getting sporadic USB disconnects after longer >> operation times. Typically, there is no USB load when it disconnects. >> The kernel output is like this: >> >> kernel: hub 4-0:1.0: state 5 ports 6 chg evt 0008 >> kernel: ehci_hcd :00:10.3: GetStatus port 3 status 00180b POWER >> sig=j PEC CSC CONNECT >> kernel: hub 4-0:1.0: port 3, status 0501, change 0003, 480 Mb/s >> kernel: usb 4-3: USB disconnect, address 2 >> kernel: usb 4-3: usb_disable_device nuking all URBs >> kernel: ehci_hcd :00:10.3: shutdown urb c633cc20 pipe 8280 ep0in >> kernel: usb 4-3: hcd_unlink_urb c633cc20 fail -16 >> >> Attached are either a rt2570-based USB WLAN stick (via native driver) or >> a LinkSys WUSB54G WLAN adapter (via ndiswrapper). Both fail, and in both >> cases the USB subsystem locks up in some nasty way (deadlock, rest of >> system still alive) so that no resetting/unloading of the involved >> drivers is possible. Only a hard reboot helps. I'm watching this effect >> for quite a while now, so far with 2.6.8 kernels, now also with a recent >> 2.6.14.2. > > If you are using ndiswrapper, there's nothing we can do to help you, > sorry. We do not need to raise a discussion about this topic here. I only mentioned ndiswrapper and the WUSB54G to express that this issues does not seem to be high-level driver related. I always thought it would be a ndiswrapper or LinkSys Windows driver issue until I bought this second device. > > What driver controls the rt2570-based USB WLAN device, usbnet? http://sourceforge.net/projects/rt2400, the stable rt2570 driver Thanks for the quick reply, Jan signature.asc Description: OpenPGP digital signature
[linux-usb-devel] USB stack hangs after sporadic disconnect
Hi, on a VIA Eden box, I'm getting sporadic USB disconnects after longer operation times. Typically, there is no USB load when it disconnects. The kernel output is like this: kernel: hub 4-0:1.0: state 5 ports 6 chg evt 0008 kernel: ehci_hcd :00:10.3: GetStatus port 3 status 00180b POWER sig=j PEC CSC CONNECT kernel: hub 4-0:1.0: port 3, status 0501, change 0003, 480 Mb/s kernel: usb 4-3: USB disconnect, address 2 kernel: usb 4-3: usb_disable_device nuking all URBs kernel: ehci_hcd :00:10.3: shutdown urb c633cc20 pipe 8280 ep0in kernel: usb 4-3: hcd_unlink_urb c633cc20 fail -16 Attached are either a rt2570-based USB WLAN stick (via native driver) or a LinkSys WUSB54G WLAN adapter (via ndiswrapper). Both fail, and in both cases the USB subsystem locks up in some nasty way (deadlock, rest of system still alive) so that no resetting/unloading of the involved drivers is possible. Only a hard reboot helps. I'm watching this effect for quite a while now, so far with 2.6.8 kernels, now also with a recent 2.6.14.2. I could live with the fact that my hardware may have some sporadic problems and I have to reset/reload parts of my system from time to time; it's easy to setup a watchdog script for this. But the hanging USB stack prevents this. Any ideas? Any additional information required? I could instrument the code and reproduce this if it helps to track the reason down. Thanks in advance, Jan PS: Please CC, I'm not subscribed. signature.asc Description: OpenPGP digital signature