On Fri, 27 Feb 2004, David Brownell wrote:

> Stephen Hemminger wrote:
> > Well moved my Stir driver that gave UHCI fits, over to OHCI for a test.
> > It caused the following when trying to unload my driver. Probably because
> > it was trying to unlink and urb that wasn't submitted.
> 
> Probably not ... an urb that wasn't submitted gives a nice clean
> errno return.  I'm guessing line 340 is this WARN_ON:
> 
>          switch (ed->state) {
>          case ED_UNLINK:         /* wait for hw to finish? */
>                  /* major IRQ delivery trouble loses INTR_SF too... */
>                  WARN_ON (limit-- == 0);
>                  ...
> 
> And yes, every time that's been reported, it's been missing IRQs;
> I don't recall the non-delivery actually being caused by OHCI.

Well, as I'm playing with the same driver, I'm seeing this too. However, 
for me the "Badness in ohci_enpoint_disable" warning is not the beginning 
but rather the final symptom of the trouble I'm seeing with ohci.

See below for a commented catch from the serial console. For me it looks 
like the trigger is what appears to be done-list corruption. Either due to 
some hardware issue or some race corrupting ohci_hcd's td-hashbin. At 
least the first indication I see is always "bad entry" reported from 
dl_reverse_done_list - followed sooner or later by either the HC dying or 
missing some completion leading to a deadlock.

Hm, I've no particular idea where this might really coming from. Stephen, 
could this be caused by the poll-timer handler resubmitting the urb 
probably racing with completion of this urb and doing mod_timer there?

Martin

------------------------

* kernel:

Linux version 2.6.3-bk9 ([EMAIL PROTECTED])
        (gcc version 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk))
        #16 SMP Sat Feb 28 10:59:21 CET 2004

* driver:

ohci_hcd: 2004 Feb 02 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ohci_hcd: block sizes: ed 64 td 64

* hardware is a companion OHCI of a NEC USB-2.0 HC (ehci_hcd not loaded)

PCI: Found IRQ 11 for device 0000:00:0c.1
ohci_hcd 0000:00:0c.1: OHCI Host Controller
ohci_hcd 0000:00:0c.1: irq 11, pci mem cd059000
ohci_hcd 0000:00:0c.1: new USB bus registered, assigned bus number 3

* with an USB-2.0 hub attached, running FS of course

hub 3-2:1.0: USB hub found
hub 3-2:1.0: 4 ports detected

* plug the (buspowered) stir4200-dongle

usb 3-2.2: new full speed USB device using address 8

* loading irda and driver modules

irda_init()
NET: Registered protocol family 23
drivers/usb/core/usb.c: registered new driver stir4200
SigmaTel support registered
SigmaTel STIr4200 IRDA/USB found at address 8, Vendor: 66f, Product: 4200
IrDA: Registered SigmaTel device irda0

* ifup the interface, start discovery and handle some connection

$ ifconnfig irda0 up
$ echo 1 > /proc/sys/net/irda/discovery

irlap_change_speed(), setting speed to 9600
irlap_change_speed(), setting speed to 4000000
irlap_change_speed(), setting speed to 9600

* !!! trouble starts: bad entry reported from dl_reverse_done_list
  seems poisoned memory in this case (in many cases it's not 5a5a5a)

ohci_hcd 0000:00:0c.1: bad entry 5a5a5a50

* going on from this point I've seen two different failure modes

--- either the HC dies:

ohci_hcd 0000:00:0c.1: OHCI Unrecoverable Error, disabled
ohci_hcd 0000:00:0c.1: HC died; cleaning up
usb 3-1: USB disconnect, address 2
usb 3-2: USB disconnect, address 3
usb 3-2.1: USB disconnect, address 4
usb 3-2.2: USB disconnect, address 8
usb 3-2.3: USB disconnect, address 5
usb 3-2.4: USB disconnect, address 6
ohci_hcd 0000:00:0c.1: leak ed cb1ac080 (#2) state 0 (has tds)

$ rmmod ohci_hcd

ohci_hcd 0000:00:0c.1: dma_pool_destroy ohci_ed, cb1ac000 busy

--- or things go south like this:

* !!! at this point the driver-thread is blocking for some urb-completion
  while trying to unlink (sync) the rx-urb:

irda0         D 00000000  7108  1299      1          1346  1170 (L-TLB)
c5651ec0 00000046 00000040 00000000 00000000 00000296 c57167d0 cbfb2be0 
       cbfb2c00 c11e3bc0 00002331 8109459f 00000095 c11e3bf0 c11e3bc0 c57167d0 
       c57169a0 c5651f20 c5651f24 c5651efc c5651f14 c011e6bc 00000000 c57167d0 
Call Trace:
 [<c011e6bc>] wait_for_completion+0xdc/0x160
 [<c011e320>] default_wake_function+0x0/0x20
 [<c011e320>] default_wake_function+0x0/0x20
 [<cd079a38>] hcd_unlink_urb+0x2f8/0x370 [usbcore]
 [<cd099f80>] stir_usb_receive+0x0/0xb0 [stir4200]
 [<cd07a374>] usb_unlink_urb+0x24/0x50 [usbcore]
 [<cd099b46>] receive_stop+0x26/0x40 [stir4200]
 [<cd099b9a>] stir_send+0x3a/0xf0 [stir4200]
 [<cd099dc4>] stir_transmit_thread+0x174/0x330 [stir4200]
 [<c011e320>] default_wake_function+0x0/0x20
 [<c011e320>] default_wake_function+0x0/0x20
 [<cd099c50>] stir_transmit_thread+0x0/0x330 [stir4200]
 [<c0107161>] kernel_thread_helper+0x5/0x14

* apparently hcd_unlink_urb is waiting for the urb splice-completion which
  will never happen (or was already missed somehow)

* now, trying to ifdown the interface deadlocks because the net_close
  waits for a completion to be signal when the (already deadlocked)
  device thread would exit.

$ ifconfig irda0 down

ifconfig      D 00000800  6488  1303   1257                     (NOTLB)
c5697de4 00000082 00000000 00000800 c564c620 c01483f6 c5717940 cbfe8c74 
       00000096 c11e3bc0 00f3f367 91038f8a 000000a0 00000000 c11e3bc0 c5717940 
       c5717b10 c81e9de0 c81e9de4 c5697e20 c5697e38 c011e6bc 00000000 c5717940 
Call Trace:
 [<c01483f6>] kmem_cache_free+0x216/0x280
 [<c011e6bc>] wait_for_completion+0xdc/0x160
 [<c011e320>] default_wake_function+0x0/0x20
 [<c02306a5>] __kfree_skb+0xb5/0xc0
 [<c011e320>] default_wake_function+0x0/0x20
 [<cd09a32f>] stir_net_close+0x2f/0xa0 [stir4200]
 [<c0234a7f>] dev_close+0x1f/0x90
 [<c0234ac8>] dev_close+0x68/0x90
 [<c0236001>] dev_change_flags+0x51/0x100
 [<c027262d>] devinet_ioctl+0x33d/0x6a0
 [<c0236692>] dev_ioctl+0x1d2/0x500
 [<c0274b54>] inet_ioctl+0xc4/0x110
 [<c012bfea>] update_process_times+0x2a/0x30
 [<c022d0ee>] sock_ioctl+0x2de/0x350
 [<c01726bf>] sys_ioctl+0x28f/0x2f9
 [<c012766c>] do_softirq+0x6c/0xd0
 [<c01094b7>] syscall_call+0x7/0xb

* now unpluging the stir4200 device from the hub:

usb 3-2: control timeout on ep0in
hub 3-2:1.0: hub_port_status failed (err = -110)
usb 3-2: control timeout on ep0in
hub 3-2:1.0: hub_port_status failed (err = -110)
usb 3-2: control timeout on ep0in
hub 3-2:1.0: hub_port_status failed (err = -110)
usb 3-2: control timeout on ep0in
hub 3-2:1.0: hub_port_status failed (err = -110)
usb 3-2: control timeout on ep0in
hub 3-2:1.0: hub_hub_status failed (err = -110)
hub 3-2:1.0: get_hub_status failed
usb 3-2: control timeout on ep0in
hub 3-2:1.0: hub_port_status failed (err = -110)
usb 3-2: control timeout on ep0in
hub 3-2:1.0: hub_port_status failed (err = -110)

* seems to continue forever - but if I unplug the hub, not much later I get:

usb 3-2: control timeout on ep0in
hub 3-2:1.0: hub_port_status failed (err = -110)
usb 3-2: control timeout on ep0in
hub 3-2:1.0: hub_port_status failed (err = -110)
usb 3-2: control timeout on ep0in
hub 3-2:1.0: hub_hub_status failed (err = -110)
hub 3-2:1.0: get_hub_status failed
Badness in ohci_endpoint_disable at drivers/usb/host/ohci-hcd.c:340
Call Trace:
 [<cd061348>] ohci_endpoint_disable+0xb8/0x1b0 [ohci_hcd]
 [<cd079cf0>] hcd_endpoint_disable+0x240/0x250 [usbcore]
 [<cd07b049>] usb_disable_endpoint+0x29/0x70 [usbcore]
 [<cd07b115>] usb_disable_device+0x25/0xb0 [usbcore]
 [<cd075bc9>] usb_disconnect+0xa9/0xf0 [usbcore]
 [<cd075bb7>] usb_disconnect+0x97/0xf0 [usbcore]
 [<cd0778d1>] hub_port_connect_change+0x61/0x260 [usbcore]
 [<cd077c59>] hub_events+0x189/0x380 [usbcore]
 [<cd077e85>] hub_thread+0x35/0xf0 [usbcore]
 [<c011e320>] default_wake_function+0x0/0x20
 [<cd077e50>] hub_thread+0x0/0xf0 [usbcore]
 [<c0107161>] kernel_thread_helper+0x5/0x14

* needless to say, trying to rmmod ohci_hcd deadlocks as well:

rmmod         D C5559DCC  6200  1407   1352                     (NOTLB)
c5559dc8 00000082 00000010 c5559dcc 00000000 c014abe7 c7f08cc0 c8421ce0 
       00000000 c11e3bc0 0000f01c 3fa0b2f6 00000137 00000000 c11e3bc0 c7f08cc0 
       c7f08e90 c844e9a8 c844e9a0 00000282 c7f08cc0 c0107ee5 00000001 c7f08cc0 
Call Trace:
 [<c014abe7>] truncate_inode_pages+0x297/0x2c0
 [<c0107ee5>] __down+0xc5/0x180
 [<c011e320>] default_wake_function+0x0/0x20
 [<c0108208>] __down_failed+0x8/0xc
 [<cd07844e>] .text.lock.hub+0x4c/0x11e [usbcore]
 [<cd0750ae>] usb_unbind_interface+0x2e/0x70 [usbcore]
 [<c01f2770>] device_release_driver+0x40/0x50
 [<c01f28ad>] bus_remove_device+0x4d/0x90
 [<c01f1a07>] device_del+0x67/0x90
 [<cd07b171>] usb_disable_device+0x81/0xb0 [usbcore]
 [<cd075bc9>] usb_disconnect+0xa9/0xf0 [usbcore]
 [<cd07d54f>] usb_hcd_pci_remove+0x6f/0x160 [usbcore]
 [<c01c892a>] pci_device_remove+0x1a/0x40
 [<c01f2770>] device_release_driver+0x40/0x50
 [<c01f27a9>] driver_detach+0x29/0x40
 [<c01f29c8>] bus_remove_driver+0x38/0x80
 [<c01f2d9f>] driver_unregister+0xf/0x33
 [<c01c8ade>] pci_unregister_driver+0xe/0x20
 [<cd0626ea>] ohci_hcd_pci_cleanup+0xa/0xc [ohci_hcd]
 [<c013b4fd>] sys_delete_module+0x15d/0x180
 [<c0152800>] do_munmap+0x70/0x180
 [<c0152954>] sys_munmap+0x44/0x70
 [<c01094b7>] syscall_call+0x7/0xb



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to