Harald Schmalzbauer wrote: > Alexander Motin schrieb am 23.02.2010 16:10 (localtime): >> Harald Schmalzbauer wrote: >>> I'm frequently getting my machine locked with ahcichX timeouts: >>> ahcich2: Timeout on slot 0 >>> ahcich2: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd c0 serr >>> 00000000 >>> ahcich2: Timeout on slot 8 >>> ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 serr >>> 00000000 >>> ahcich2: Timeout on slot 8 >>> ahcich2: is 00000000 cs fffff07f ss ffffff7f rs ffffff7f tfd c0 serr >>> 00000000 >>> ... >> >> Looking that is (Interrupt status) is zero and `rs == cs | ss` (running >> command bitmasks in driver and hardware), controller doesn't report >> command completion. Looking on TFD status 0xc0 with BUSY bit set, I >> would suppose that either disk stuck in command processing for some >> reason, or controller missed command completion status. >> >> Have you noticed 30 second (default ATA timeout) pause before timeout >> message printed? Just want to be sure that driver waited enough before >> give up. >> >>> This happens when backup over GbE overloads ZFS/HDD capabilities. >>> I reduced vfs.zfs.txg.timeout to 1 to prevent the machine from locking >>> up almost immediately, but from it still happens. >>> When I don't use ahci but ataahci (the old driver if I understand things >>> correct) I also see the ZFS burst write congestion, but this doesn't >>> lead to controller timeouts, thus blocking the machine. >>> >>> Sometimes the machine recovers from the disk lock, but most often I have >>> to reboot. >> >> How it looks when it doesn't? Can you send me full log messages? > > Hello, this morning I had a stall, but the machine recovered after about > one Minute. Here's what I got from the kernel: > ahcich2: Timeout on slot 29 > ahcich2: is 00000000 cs 00000003 ss e0000003 rs e0000003 tfd c0 serr > 00000000 > em1: watchdog timeout -- resetting > em1: watchdog timeout -- resetting > ahcich2: Timeout on slot 10 > ahcich2: is 00000000 cs 00006000 ss 00007c00 rs 00007c00 tfd c0 serr > 00000000 > ahcich2: Timeout on slot 18 > ahcich2: is 00000000 cs 00040000 ss 00000000 rs 00040000 tfd c0 serr > 00000000 > ahcich2: Timeout on slot 2 > ahcich2: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd c0 serr > 00000000 > ahcich2: Timeout on slot 2 > ahcich2: is 00000000 cs 00000000 ss 0000000c rs 0000000c tfd 40 serr > 00000000 > > Does this tell you something useful?
It doesn't. Looking on logged register content - commands are indeed still running and no interrupts requested. Interesting to see em1 watchdog timeout there. Aren't they related somehow? -- Alexander Motin _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"