On Sun, 29 Feb 2004, David Brownell wrote:

> > * !!! trouble starts: bad entry reported from dl_reverse_done_list
> >   seems poisoned memory in this case (in many cases it's not 5a5a5a)
> > 
> > ohci_hcd 0000:00:0c.1: bad entry 5a5a5a50
> 
> That's really odd.  That poison byte is used by mm/slab.c, but
> this "bad entry" came from a TD or ED, and hence from a dmapool.
> And drivers/base/dmapool.c uses different poison bytes.  (Partly
> to highlight strangeness like this ... )

As I said most of the bad entry addresses aren't 5a5a5a-like, I just 
selected this one because I considered the slab poison-byte remarkable 
too. In most cases the look more like being "normal".

> THEORY:  somebody's abusing some general purpose kmem_cache_t,
> maybe the one dmapool used for a "struct dma_page" holding TDs
> ("size-32"?).  That holds a dma_addr_t which may get handed
> to the OHCI controller.  That dma_addr_t could be poisoned, and
> change to 0x5a5a5a5a ... it'd come back from OHCI as 0x5a5a5a50
> if it were used as a non-isochronous TD.  (16 byte alignment is
> required, low bits are ignored and written back as zero.)  The
> "bad entry" was supposed to be a TD.

Well, switching the same setup to ehci using the HS-hub's TT there are now 
problems even remotely comparable to this one. To me it looks like either 
the stir4200 driver has an issue which manifests with ohci only for some 
reason or it exposes some issue with ohci_hcd. But I'm only speculating 
since I don't see what it might be...

> That should give you some ideas for a few patches to try...
> 
> Also, for some more info you could try the attached patch, which
> dump OHCI's TD hashtable at that point.  The phys_to_virt() is
> not appropriate (oopsable) on many non-PC platforms though.

see below.

> > * going on from this point I've seen two different failure modes
> > 
> > --- either the HC dies:
> 
> This path is relatively polite.   Whatever did what,
> the HC probably got bad address, then PCI DMA error,
> then polite hardware shutdown.

Ok, that was my understanding as well.

> > --- or things go south like this:
> 
> The rude approach.  Whatever corruption happened was
> not blatant enough to make the OHCI silicon die, so
> things just mysteriously wedge.  We hope it didn't use
> DMA to clobber pagetables or somesuch.

;-)

But AFAICS the box continous to work rock solid - it's just usb became 
unuseable. To me it looks like the corruption remains confined in 
pci_pool.

Ok, I've repeated the issue. This time with DEBUG enabled in ohci_hcd and 
your diagnostics-patch applied. I've unplugged all usb-device except the 
stir4200 and the hub it was connected to. Just after a fresh reboot of 
2.6.3-bk9 the sequence below was catched. This time I managed to shut 
things down before getting into more serious trouble. But the done-list 
corruptions is there. And it's not a 5a5a5a poisoned thing this time.

FWIW, since the suspicous entry looks like it might be a SETUP packet from 
a control transfer, it might be worth mentioning the stir4200 driver is 
doing _a lot of_ control transfers because it needs to poll some 
fifo-register in the device. These are all synch. usb_control_msg issued 
by the dedicated driver kernel thread.

Is there anything we might learn from the td-hash dump?

Thanks,
Martin

----------------------

* driver

ohci_hcd: 2004 Feb 02 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ohci_hcd: block sizes: ed 64 td 64

* this NEC OHCI-companion

PCI: Found IRQ 11 for device 0000:00:0c.1
ohci_hcd 0000:00:0c.1: OHCI Host Controller
ohci_hcd 0000:00:0c.1: reset, control = 0x0
ohci_hcd 0000:00:0c.1: irq 11, pci mem cd059000
ohci_hcd 0000:00:0c.1: new USB bus registered, assigned bus number 3
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
ohci_hcd 0000:00:0c.1: created debug files
ohci_hcd 0000:00:0c.1: OHCI controller state
ohci_hcd 0000:00:0c.1: OHCI 1.0, with legacy support registers
ohci_hcd 0000:00:0c.1: control 0x083 HCFS=operational CBSR=3
ohci_hcd 0000:00:0c.1: cmdstatus 0x00000 SOC=0
ohci_hcd 0000:00:0c.1: intrstatus 0x00000004 SF
ohci_hcd 0000:00:0c.1: intrenable 0x80000012 MIE UE WDH
ohci_hcd 0000:00:0c.1: hcca frame #0403
ohci_hcd 0000:00:0c.1: roothub.a ff000202 POTPGT=255 NPS NDP=2
ohci_hcd 0000:00:0c.1: roothub.b 00000000 PPCM=0000 DR=0000
ohci_hcd 0000:00:0c.1: roothub.status 00000000
ohci_hcd 0000:00:0c.1: roothub.portstatus [0] 0x00000100 PPS
ohci_hcd 0000:00:0c.1: roothub.portstatus [1] 0x00000100 PPS

* connect the hub

ohci_hcd 0000:00:0c.1: GetStatus roothub.portstatus [2] = 0x00010101 CSC PPS CCS
ohci_hcd 0000:00:0c.1: GetStatus roothub.portstatus [2] = 0x00100103 PRSC PPS PES CCS
usb 3-2: new full speed USB device using address 2
hub 3-2:1.0: USB hub found
hub 3-2:1.0: 4 ports detected

* connect the stir4200

usb 3-2.2: new full speed USB device using address 3

* modprobe stir4200 driver

irda_init()
NET: Registered protocol family 23
SigmaTel STIr4200 IRDA/USB found at address 3, Vendor: 66f, Product: 4200
IrDA: Registered SigmaTel device irda0
drivers/usb/core/usb.c: registered new driver stir4200
SigmaTel support registered

* using the device (ifup, start discovery)

irlap_change_speed(), setting speed to 9600

* only a few seconds later

ohci_hcd 0000:00:0c.1: bad entry  afbc0c0
ohci_hcd 0000:00:0c.1: td_hash[0]
ohci_hcd 0000:00:0c.1:  entry td cafbc000; urb cb23a2e0 index 0; hw next td 0afbc040
ohci_hcd 0000:00:0c.1:      info 00140000 CC=0 (CARRY) DI=0 IN R
ohci_hcd 0000:00:0c.1:      cbp 0afbe020 be 0afbe020 (len 1)
ohci_hcd 0000:00:0c.1: td_hash[4]
ohci_hcd 0000:00:0c.1:  entry td cafbc100; urb c8298ae4 index 0; hw next td 00000000
ohci_hcd 0000:00:0c.1:      info 03140000 CC=0 DATA1 DI=0 IN R
ohci_hcd 0000:00:0c.1:      cbp 08264000 be 08264fff (len 4096)
ohci_hcd 0000:00:0c.1: bad? td cafbc0c0; urb 00000000 index 0; hw next td 0afbc100
ohci_hcd 0000:00:0c.1:      info 5e000000 CC=5 DATA0 DI=0 SETUP 
ohci_hcd 0000:00:0c.1:      cbp 00000000 be 00000000 (len 0)

* ifconfig irda0 down
* rmmod stir4200

drivers/usb/core/usb.c: deregistering driver stir4200

* disconnect stir4200

usb 3-2.2: USB disconnect, address 3

* disconnect the hub

ohci_hcd 0000:00:0c.1: urb cb23a2e0 path 2 ep1in 5e160000 cc 5 --> status -110
ohci_hcd 0000:00:0c.1: GetStatus roothub.portstatus [2] = 0x00030100 PESC CSC PPS
usb 3-2: USB disconnect, address 2
ohci_hcd 0000:00:0c.1: GetStatus roothub.portstatus [2] = 0x00020100 PESC PPS

* rmmod ohci_hcd

ohci_hcd 0000:00:0c.1: remove, state 1
usb usb3: USB disconnect, address 1
ohci_hcd 0000:00:0c.1: stop operational controller (state 0x85)
ohci_hcd 0000:00:0c.1: OHCI controller state
ohci_hcd 0000:00:0c.1: OHCI 1.0, with legacy support registers
ohci_hcd 0000:00:0c.1: control 0x083 HCFS=operational CBSR=3
ohci_hcd 0000:00:0c.1: cmdstatus 0x00000 SOC=0
ohci_hcd 0000:00:0c.1: intrstatus 0x00000064 RHSC FNO SF
ohci_hcd 0000:00:0c.1: intrenable 0x80000012 MIE UE WDH
ohci_hcd 0000:00:0c.1: hcca frame #9d84
ohci_hcd 0000:00:0c.1: roothub.a ff000202 POTPGT=255 NPS NDP=2
ohci_hcd 0000:00:0c.1: roothub.b 00000000 PPCM=0000 DR=0000
ohci_hcd 0000:00:0c.1: roothub.status 00000000
ohci_hcd 0000:00:0c.1: roothub.portstatus [0] 0x00000100 PPS
ohci_hcd 0000:00:0c.1: roothub.portstatus [1] 0x00000100 PPS
ohci_hcd 0000:00:0c.1: reset, control = 0x83
ohci_hcd 0000:00:0c.1: USB bus 3 deregistered





-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to