On Fri, 2015-09-18 at 02:04 +0100, David Woodhouse wrote: > On Fri, 2015-09-18 at 01:44 +0200, Francois Romieu wrote: > > The TxDmaOkLowDesc register may tell if the Tx dma part is still > > making any progress. I have added a TxPoll request. See below. > > I've just added that into the original TX timeout handler, since that > doesn't seem to be crashing the box for me as long as I avoid the IRQ > storm. > > Not sure what we learn from it ('desc 6550' printed as hex)... I've > also made it dump the TX descriptor ring (skb, addr, opts1, opts2):
The TxDmaOkLowDesc values look sane to me; they always match the low bits of the last descriptor that *should* have been sent, according to the ring dumps. I made it store and dump the original contents of the TX descriptor ring too (before it gets overwritten by the hardware). So we can see what *was* being transmitted. It isn't any more enlightening. I also tried just prodding the hardware by writing to the TxPoll register. That wasn't sufficient to restart it. [26589.024750] 8139cp 0000:00:0b.0 eth1: Transmit timeout, status c 2b head 52 tail 45 desc 96c0 0 80ff [26589.034632] TX ring 00: (null) 1de16e7c 30003130 0 (b00005ea 0) [26589.034632] TX ring 01: (null) 1de165ac 30003130 0 (b00005ea 0) [26589.034632] TX ring 02: (null) 1de1540c 30003130 0 (b00005ea 0) [26589.034632] TX ring 03: (null) 1de15cdc 30003130 0 (b00005ea 0) [26589.034632] TX ring 04: (null) 1de14b3c 30003130 0 (b00005ea 0) [26589.034632] TX ring 05: (null) 1de1399c 30003130 0 (b00005ea 0) [26589.034632] TX ring 06: (null) 1de130cc 30003130 0 (b00005ea 0) [26589.034632] TX ring 07: (null) 1de1426c 30003130 0 (b00005ea 0) [26589.034632] TX ring 08: (null) 1de127fc 30003130 0 (b00005ea 0) [26589.034632] TX ring 09: (null) 1de11f2c 30003130 0 (b00005ea 0) [26589.034632] TX ring 10: (null) 1de10d8c 30003130 0 (b0000056 0) [26589.034632] TX ring 11: (null) 1de1165c 30003130 0 (b00005ea 0) [26589.034632] TX ring 12: (null) 1df4774c 30003130 0 (b00005ea 0) [26589.034632] TX ring 13: (null) 1de104bc 30003130 0 (b00005ea 0) [26589.034632] TX ring 14: (null) 1df46e7c 30003130 0 (b00005ea 0) [26589.034632] TX ring 15: (null) 1df465ac 30003130 0 (b00005ea 0) [26589.034632] TX ring 16: (null) 1df45cdc 30003130 0 (b00005ea 0) [26589.034632] TX ring 17: (null) 1df44b3c 30003130 0 (b0000056 0) [26589.034632] TX ring 18: (null) 1df4426c 30003130 0 (b00005ea 0) [26589.034632] TX ring 19: (null) 1df4540c 30003130 0 (b00005ea 0) [26589.034632] TX ring 20: (null) 1df4399c 30004954 0 (b00005ea 0) [26589.034632] TX ring 21: (null) 1df430cc 30004954 0 (b00005ea 0) [26589.034632] TX ring 22: (null) 1df427fc 30004954 0 (b00005ea 0) [26589.034632] TX ring 23: (null) 1df4165c 30004954 0 (b00005ea 0) [26589.034632] TX ring 24: (null) 1df41f2c 30004954 0 (b00005ea 0) [26589.034632] TX ring 25: (null) 1df40d8c 30004954 0 (b00005ea 0) [26589.034632] TX ring 26: (null) 1dcc9602 3000796b 0 (b000006a 0) [26589.034632] TX ring 27: (null) 1de1774c 30000000 0 (b0000097 0) [26589.034632] TX ring 28: (null) 1de16e7c 30000000 0 (b0000467 0) [26589.034632] TX ring 29: (null) 1df404bc 30000000 0 (b0000557 0) [26589.034632] TX ring 30: (null) 1de165ac 30000000 0 (b00000e7 0) [26589.034632] TX ring 31: (null) 1de1540c 30000000 0 (b00000d7 0) [26589.034632] TX ring 32: (null) 1dcc9602 300087b2 0 (b0000062 0) [26589.034632] TX ring 33: (null) 1de15cdc 300087b2 0 (b0000046 0) [26589.034632] TX ring 34: (null) 1de14b3c 30000540 0 (b000008b 0) [26589.034632] TX ring 35: (null) 1de1426c 3000709e 0 (b0000097 0) [26589.034632] TX ring 36: (null) 1de1399c 3000709e 0 (b0000097 0) [26589.034632] TX ring 37: (null) 1de130cc 3000c169 0 (b0000084 0) [26589.034632] TX ring 38: (null) 1de127fc 3000a5eb 0 (b00005ea 0) [26589.034632] TX ring 39: (null) 1de11f2c 3000d4a3 0 (b0000557 0) [26589.034632] TX ring 40: (null) 1de10d8c 3000b57e 0 (b0000046 0) [26589.034632] TX ring 41: (null) 1de104bc 30000000 0 (b00000a7 0) [26589.034632] TX ring 42: (null) 1dce774c 30004000 0 (b0000046 0) [26589.034632] TX ring 43: (null) 1dce6e7c 300076f6 0 (b0000046 0) [26589.034632] TX ring 44: (null) 1dce5cdc 30002034 0 (b0000096 0) [26589.034632] TX ring 45: dde4c3c0 1dce540c b0000557 0 (b0000557 0) [26589.034632] TX ring 46: dddc1900 1dce426c b0000097 0 (b0000097 0) [26589.034632] TX ring 47: defb3600 1dce4b3c b0000097 0 (b0000097 0) [26589.034632] TX ring 48: dde5b240 1dcc9602 b000006a 0 (b000006a 0) [26589.034632] TX ring 49: dde5b540 1dce30cc b0000097 0 (b0000097 0) [26589.034632] TX ring 50: dde5b600 1dce399c b0000097 0 (b0000097 0) [26589.034632] TX ring 51: dde5b180 1dce1f2c b00005ea 0 (b00005ea 0) [26589.034632] TX ring 52: (null) 1df4540c 30003130 0 (b0000056 0) [26589.034632] TX ring 53: (null) 1df45cdc 30003130 0 (b00005ea 0) [26589.034632] TX ring 54: (null) 1df4426c 30003130 0 (b00005ea 0) [26589.034632] TX ring 55: (null) 1df44b3c 30003130 0 (b00005ea 0) [26589.034632] TX ring 56: (null) 1df4399c 30003130 0 (b00005ea 0) [26589.034632] TX ring 57: (null) 1df430cc 30003130 0 (b00005ea 0) [26589.034632] TX ring 58: (null) 1df427fc 30003130 0 (b00005ea 0) [26589.034632] TX ring 59: (null) 1df41f2c 30003130 0 (b0000056 0) [26589.034632] TX ring 60: (null) 1df4165c 30003130 0 (b00005ea 0) [26589.034632] TX ring 61: (null) 1df404bc 30003130 0 (b00005ea 0) [26589.034632] TX ring 62: (null) 1df40d8c 30003130 0 (b00005ea 0) [26589.034632] TX ring 63: (null) 1de1774c 70003130 0 (f00005ea 0) Those values we see in the low 16 bits of the opts1 field after TX (0x3130 in the last 12 or so immediately above) look bogus. Those are supposed to be the frame length that actually got transmitted. But they bear no relation to the length of the packet it was actually asked to transmit. And sometimes it's zero (but not really correlated with the TX hang). We use this value in cp_tx() for the dma_unmap call, as the length to unmap. I've augmented the debugging in cp_tx() and we really are seeing the same bogus values there; I'm surprised we've never seen DMA API debugging trip up on this. At this point I have no idea what the device is putting in the "TX status" after a transmit, but it doesn't seem to match what's in the datasheet. [26906.679823] 8139cp 0000:00:0b.0 eth1: tx done, slot 35, status 0x3000f804 [26906.686910] 8139cp 0000:00:0b.0 eth1: tx done, slot 36, status 0x3000f804 [26906.693894] 8139cp 0000:00:0b.0 eth1: tx done, slot 37, status 0x30001e69 [26906.700912] 8139cp 0000:00:0b.0 eth1: tx done, slot 38, status 0x3000ca24 [26906.707800] 8139cp 0000:00:0b.0 eth1: tx done, slot 39, status 0x3000ca24 [26906.714902] 8139cp 0000:00:0b.0 eth1: tx done, slot 40, status 0x3000ca24 [26906.721787] 8139cp 0000:00:0b.0 eth1: tx done, slot 41, status 0x30004006 [26906.728942] 8139cp 0000:00:0b.0 eth1: tx done, slot 42, status 0x30004006 [26906.735921] 8139cp 0000:00:0b.0 eth1: tx done, slot 43, status 0x30004006 [26906.742948] 8139cp 0000:00:0b.0 eth1: tx done, slot 44, status 0x30004006 -- dwmw2
smime.p7s
Description: S/MIME cryptographic signature