On Sun, 29 Feb 2004, David Brownell wrote: > > * !!! trouble starts: bad entry reported from dl_reverse_done_list > > seems poisoned memory in this case (in many cases it's not 5a5a5a) > > > > ohci_hcd 0000:00:0c.1: bad entry 5a5a5a50 > > That's really odd. That poison byte is used by mm/slab.c, but > this "bad entry" came from a TD or ED, and hence from a dmapool. > And drivers/base/dmapool.c uses different poison bytes. (Partly > to highlight strangeness like this ... )
As I said most of the bad entry addresses aren't 5a5a5a-like, I just selected this one because I considered the slab poison-byte remarkable too. In most cases the look more like being "normal". > THEORY: somebody's abusing some general purpose kmem_cache_t, > maybe the one dmapool used for a "struct dma_page" holding TDs > ("size-32"?). That holds a dma_addr_t which may get handed > to the OHCI controller. That dma_addr_t could be poisoned, and > change to 0x5a5a5a5a ... it'd come back from OHCI as 0x5a5a5a50 > if it were used as a non-isochronous TD. (16 byte alignment is > required, low bits are ignored and written back as zero.) The > "bad entry" was supposed to be a TD. Well, switching the same setup to ehci using the HS-hub's TT there are now problems even remotely comparable to this one. To me it looks like either the stir4200 driver has an issue which manifests with ohci only for some reason or it exposes some issue with ohci_hcd. But I'm only speculating since I don't see what it might be... > That should give you some ideas for a few patches to try... > > Also, for some more info you could try the attached patch, which > dump OHCI's TD hashtable at that point. The phys_to_virt() is > not appropriate (oopsable) on many non-PC platforms though. see below. > > * going on from this point I've seen two different failure modes > > > > --- either the HC dies: > > This path is relatively polite. Whatever did what, > the HC probably got bad address, then PCI DMA error, > then polite hardware shutdown. Ok, that was my understanding as well. > > --- or things go south like this: > > The rude approach. Whatever corruption happened was > not blatant enough to make the OHCI silicon die, so > things just mysteriously wedge. We hope it didn't use > DMA to clobber pagetables or somesuch. ;-) But AFAICS the box continous to work rock solid - it's just usb became unuseable. To me it looks like the corruption remains confined in pci_pool. Ok, I've repeated the issue. This time with DEBUG enabled in ohci_hcd and your diagnostics-patch applied. I've unplugged all usb-device except the stir4200 and the hub it was connected to. Just after a fresh reboot of 2.6.3-bk9 the sequence below was catched. This time I managed to shut things down before getting into more serious trouble. But the done-list corruptions is there. And it's not a 5a5a5a poisoned thing this time. FWIW, since the suspicous entry looks like it might be a SETUP packet from a control transfer, it might be worth mentioning the stir4200 driver is doing _a lot of_ control transfers because it needs to poll some fifo-register in the device. These are all synch. usb_control_msg issued by the dedicated driver kernel thread. Is there anything we might learn from the td-hash dump? Thanks, Martin ---------------------- * driver ohci_hcd: 2004 Feb 02 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) ohci_hcd: block sizes: ed 64 td 64 * this NEC OHCI-companion PCI: Found IRQ 11 for device 0000:00:0c.1 ohci_hcd 0000:00:0c.1: OHCI Host Controller ohci_hcd 0000:00:0c.1: reset, control = 0x0 ohci_hcd 0000:00:0c.1: irq 11, pci mem cd059000 ohci_hcd 0000:00:0c.1: new USB bus registered, assigned bus number 3 hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected ohci_hcd 0000:00:0c.1: created debug files ohci_hcd 0000:00:0c.1: OHCI controller state ohci_hcd 0000:00:0c.1: OHCI 1.0, with legacy support registers ohci_hcd 0000:00:0c.1: control 0x083 HCFS=operational CBSR=3 ohci_hcd 0000:00:0c.1: cmdstatus 0x00000 SOC=0 ohci_hcd 0000:00:0c.1: intrstatus 0x00000004 SF ohci_hcd 0000:00:0c.1: intrenable 0x80000012 MIE UE WDH ohci_hcd 0000:00:0c.1: hcca frame #0403 ohci_hcd 0000:00:0c.1: roothub.a ff000202 POTPGT=255 NPS NDP=2 ohci_hcd 0000:00:0c.1: roothub.b 00000000 PPCM=0000 DR=0000 ohci_hcd 0000:00:0c.1: roothub.status 00000000 ohci_hcd 0000:00:0c.1: roothub.portstatus [0] 0x00000100 PPS ohci_hcd 0000:00:0c.1: roothub.portstatus [1] 0x00000100 PPS * connect the hub ohci_hcd 0000:00:0c.1: GetStatus roothub.portstatus [2] = 0x00010101 CSC PPS CCS ohci_hcd 0000:00:0c.1: GetStatus roothub.portstatus [2] = 0x00100103 PRSC PPS PES CCS usb 3-2: new full speed USB device using address 2 hub 3-2:1.0: USB hub found hub 3-2:1.0: 4 ports detected * connect the stir4200 usb 3-2.2: new full speed USB device using address 3 * modprobe stir4200 driver irda_init() NET: Registered protocol family 23 SigmaTel STIr4200 IRDA/USB found at address 3, Vendor: 66f, Product: 4200 IrDA: Registered SigmaTel device irda0 drivers/usb/core/usb.c: registered new driver stir4200 SigmaTel support registered * using the device (ifup, start discovery) irlap_change_speed(), setting speed to 9600 * only a few seconds later ohci_hcd 0000:00:0c.1: bad entry afbc0c0 ohci_hcd 0000:00:0c.1: td_hash[0] ohci_hcd 0000:00:0c.1: entry td cafbc000; urb cb23a2e0 index 0; hw next td 0afbc040 ohci_hcd 0000:00:0c.1: info 00140000 CC=0 (CARRY) DI=0 IN R ohci_hcd 0000:00:0c.1: cbp 0afbe020 be 0afbe020 (len 1) ohci_hcd 0000:00:0c.1: td_hash[4] ohci_hcd 0000:00:0c.1: entry td cafbc100; urb c8298ae4 index 0; hw next td 00000000 ohci_hcd 0000:00:0c.1: info 03140000 CC=0 DATA1 DI=0 IN R ohci_hcd 0000:00:0c.1: cbp 08264000 be 08264fff (len 4096) ohci_hcd 0000:00:0c.1: bad? td cafbc0c0; urb 00000000 index 0; hw next td 0afbc100 ohci_hcd 0000:00:0c.1: info 5e000000 CC=5 DATA0 DI=0 SETUP ohci_hcd 0000:00:0c.1: cbp 00000000 be 00000000 (len 0) * ifconfig irda0 down * rmmod stir4200 drivers/usb/core/usb.c: deregistering driver stir4200 * disconnect stir4200 usb 3-2.2: USB disconnect, address 3 * disconnect the hub ohci_hcd 0000:00:0c.1: urb cb23a2e0 path 2 ep1in 5e160000 cc 5 --> status -110 ohci_hcd 0000:00:0c.1: GetStatus roothub.portstatus [2] = 0x00030100 PESC CSC PPS usb 3-2: USB disconnect, address 2 ohci_hcd 0000:00:0c.1: GetStatus roothub.portstatus [2] = 0x00020100 PESC PPS * rmmod ohci_hcd ohci_hcd 0000:00:0c.1: remove, state 1 usb usb3: USB disconnect, address 1 ohci_hcd 0000:00:0c.1: stop operational controller (state 0x85) ohci_hcd 0000:00:0c.1: OHCI controller state ohci_hcd 0000:00:0c.1: OHCI 1.0, with legacy support registers ohci_hcd 0000:00:0c.1: control 0x083 HCFS=operational CBSR=3 ohci_hcd 0000:00:0c.1: cmdstatus 0x00000 SOC=0 ohci_hcd 0000:00:0c.1: intrstatus 0x00000064 RHSC FNO SF ohci_hcd 0000:00:0c.1: intrenable 0x80000012 MIE UE WDH ohci_hcd 0000:00:0c.1: hcca frame #9d84 ohci_hcd 0000:00:0c.1: roothub.a ff000202 POTPGT=255 NPS NDP=2 ohci_hcd 0000:00:0c.1: roothub.b 00000000 PPCM=0000 DR=0000 ohci_hcd 0000:00:0c.1: roothub.status 00000000 ohci_hcd 0000:00:0c.1: roothub.portstatus [0] 0x00000100 PPS ohci_hcd 0000:00:0c.1: roothub.portstatus [1] 0x00000100 PPS ohci_hcd 0000:00:0c.1: reset, control = 0x83 ohci_hcd 0000:00:0c.1: USB bus 3 deregistered ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel