2.2.18 eepro100 hangs (cmd_wait for(0xffffff80) timedout with(0xffffff80)!)
Hi, 600mhz celery 2.2.18 vanilla, intel 815EE mobo, onboard eepro100 and pci 3com 905b. Happens when traffic is high to the eepro. FWIW, the 3c905 (3c59x module) also reports 'too much work in interrupt' but it did not hang like the eepro100, and by adding max_interrupt_work=40 with modprobe, that also solved the warnings. But the intel card actually hangs after it shows those warnings, so we thought it was a problem. messages:Jan 4 09:01:44 axiom1 kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html Jan 4 09:01:44 axiom1 kernel: eepro100.c: $Revision: 1.20.2.10 $ 2000/05/31 Modified by Andrey V. Savochkin <[EMAIL PROTECTED]> and others Jan 4 09:01:44 axiom1 kernel: eepro100.c: VA Linux custom, Dragan Stancevic <[EMAIL PROTECTED]> 2000/11/15 Jan 4 09:01:44 axiom1 kernel: eth0: Intel PCI EtherExpress Pro100 82562EM, 00:03:47:0A:C0:97, IRQ 11. Jan 4 09:01:44 axiom1 kernel: Receiver lock-up bug exists -- enabling work-around. Jan 4 09:01:44 axiom1 kernel: Board assembly 00-000, Physical connectors present: RJ45 Jan 4 09:01:44 axiom1 kernel: Primary interface chip i82555 PHY #1. Jan 4 09:01:44 axiom1 kernel: General self-test: passed. Jan 4 09:01:44 axiom1 kernel: Serial sub-system self-test: passed. Jan 4 09:01:44 axiom1 kernel: Internal registers self-test: passed. Jan 4 09:01:44 axiom1 kernel: ROM checksum self-test: passed (0x04f4518b). Jan 4 09:01:44 axiom1 kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html Jan 4 09:01:44 axiom1 kernel: eepro100.c: $Revision: 1.20.2.10 $ 2000/05/31 Modified by Andrey V. Savochkin <[EMAIL PROTECTED]> and others Jan 4 09:01:44 axiom1 kernel: eepro100.c: VA Linux custom, Dragan Stancevic <[EMAIL PROTECTED]> 2000/11/15 ... Jan 4 17:14:03 axiom1 kernel: eth1: Too much work in interrupt, status e401. Jan 4 17:14:42 axiom1 kernel: eth1: Too much work in interrupt, status e481. Jan 4 17:16:36 axiom1 kernel: eepro100: cmd_wait for(0xff80) timedout with(0xff80)! Jan 4 17:17:09 axiom1 last message repeated 18 times Jan 4 17:18:49 axiom1 last message repeated 8 times Jan 4 17:20:49 axiom1 last message repeated 3 times Jan 4 17:22:49 axiom1 last message repeated 3 times Jan 4 17:24:49 axiom1 last message repeated 3 times Jan 4 17:26:49 axiom1 last message repeated 3 times Jan 4 17:28:49 axiom1 last message repeated 3 times Jan 4 17:30:49 axiom1 last message repeated 3 times Jan 4 17:32:49 axiom1 last message repeated 3 times Jan 4 17:59:28 axiom1 last message repeated 3 times Jan 4 17:59:42 axiom1 last message repeated 7 times Jan 4 17:59:50 axiom1 kernel: eth0: Transmit timed out: status 0050 0c80 at 13421/13481 command 000c. Jan 4 17:59:50 axiom1 kernel: eepro100: cmd_wait for(0xff80) timedout with(0xff80)! lspci -v:00:00.0 Host bridge: Intel Corporation: Unknown device 1130 (rev 02) Flags: bus master, fast devsel, latency 0 Capabilities: [88] #09 [f104] 00:02.0 VGA compatible controller: Intel Corporation: Unknown device 1132 (rev 02) (prog-if 00 [VGA]) Subsystem: Intel Corporation: Unknown device 4541 Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 11 Memory at f800 (32-bit, prefetchable) Memory at ffa8 (32-bit, non-prefetchable) Capabilities: [dc] Power Management version 2 00:1e.0 PCI bridge: Intel Corporation: Unknown device 244e (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=64 I/O behind bridge: d000-dfff Memory behind bridge: ff80-ff8f Prefetchable memory behind bridge: f6a0-f6af 00:1f.0 ISA bridge: Intel Corporation: Unknown device 2440 (rev 01) Flags: bus master, medium devsel, latency 0 00:1f.1 IDE interface: Intel Corporation: Unknown device 244b (rev 01) (prog-if 80 [Master]) Subsystem: Intel Corporation: Unknown device 4541 Flags: bus master, medium devsel, latency 0 I/O ports at ffa0 00:1f.2 USB Controller: Intel Corporation: Unknown device 2442 (rev 01) (prog-if 00 [UHCI]) Subsystem: Intel Corporation: Unknown device 4541 Flags: bus master, medium devsel, latency 0, IRQ 10 I/O ports at ef40 00:1f.3 SMBus: Intel Corporation: Unknown device 2443 (rev 01) Subsystem: Intel Corporation: Unknown device 4541 Flags: medium devsel, IRQ 6 I/O ports at efa0 00:1f.4 USB Controller: Intel Corporation: Unknown device 2444 (rev 01) (prog-if 00 [UHCI]) Subsystem: Intel Corporation: Unknown device 4541 Flags: bus master, medium devsel, latency 0, IRQ 9 I/O ports at ef80 00:1f.5 Multimedia audio controller: Intel Corporation: Unknown device 2445 (rev 01) Subsystem: Intel Corporation: Unknown device 4541 Flags: bus master, medium
Re: [eepro100] Ok, I'm fed up now
I am having the same problems, I have duplicated the hard lockups / ethernet hangs on two intel 815EE boards. It happens when send traffic through the onboard eepro100 is high, and sometimes running something like vmstat 1 in the background triggers the lockup faster. When it locks up there is nothing in the log, no oops or anything. Sometimes it just hangs eth0 with the (cmd timeout) msgs and an ifconfig down/up fixes it temporarily. I am using 2.2.18, heres compile info: Jan 9 08:29:32 axiom1 kernel: Linux version 2.2.18 (root@flankerbuild) (gcc version egcs-2.91.66 19990Jan 9 08:29:32 axiom1 kernel: Detected 598065 kHz processor. Jan 9 08:29:32 axiom1 kernel: Console: colour VGA+ 80x25 Jan 9 08:29:32 axiom1 kernel: Calibrating delay loop... 1192.75 BogoMIPS Jan 9 08:29:32 axiom1 kernel: Memory: 126976k/129792k available (884k kernel code, 412k reserved, 1480Jan 9 08:29:32 axiom1 kernel: Dentry hash table entries: 16384 (order 5, 128k) Jan 9 08:29:32 axiom1 kernel: Buffer cache hash table entries: 131072 (order 7, 512k) Jan 9 08:29:32 axiom1 kernel: Page cache hash table entries: 32768 (order 5, 128k) Jan 9 08:29:32 axiom1 kernel: Intel machine check architecture supported. Jan 9 08:29:32 axiom1 kernel: Intel machine check reporting enabled on CPU#0. Jan 9 08:29:32 axiom1 kernel: 128K L2 cache (4 way) Jan 9 08:29:32 axiom1 kernel: CPU: L2 Cache: 128K Jan 9 08:29:32 axiom1 kernel: CPU: Intel Pentium III (Coppermine) stepping 06 Jan 9 08:29:32 axiom1 kernel: Checking 386/387 coupling... OK, FPU using exception 16 error reporting.Jan 9 08:29:32 axiom1 kernel: Checking 'hlt' instruction... OK. Jan 9 08:29:32 axiom1 kernel: POSIX conformance testing by UNIFIX Jan 9 08:29:32 axiom1 kernel: PCI: PCI BIOS revision 2.10 entry at 0xfda95 Jan 9 08:29:32 axiom1 kernel: PCI: Using configuration type 1 Jan 9 08:29:32 axiom1 kernel: PCI: Probing PCI hardware Jan 9 08:29:32 axiom1 kernel: Linux NET4.0 for Linux 2.2 I will test the latest 2.2.19pre with it. Do you need my kernel config, it's all generic, i have eepro100 and 3c59x built as modules everything else is default (smp is OFF). I can trigger this lockup repeatedly by doing a shell script on the 815 that rsyncs a large file to another machine over and over. After about 5 minutes of high load it either hangs hard or hangs the eth card. I have a 3com59x in there as well, it and the onboard eepro are both using irq 11 - is that normal? The 3com is in promiscuous mode sniffing a lot of traffic as well, during the tests. -- original text -- Ok, I just build another server, also using an Intel D815EAL. Same exact setup as the other one. Celeron 600cpu w/ 128mb ram (PC133 btw). both systems lockup doing ftp transfers. I didn't know for sure about the first one, so I just opened up an ncftp session and downloaded something large from the internet. Netscape 6 was what I found (not planning on installing it btw), didn't get very far then locked up tight. twice in a row. no errors, nothing, nothing to report either. So, I built another. i have parts for 2 of them (almost ready for the garbage) so i put together the second. i can hardly download anything. can't even get down new rpm or source code for the new kernels. i'm on my last nerve here and about to throw the motherboards out the window. I don't understand how you guys make these motherboards work. what am i missing? i'm so fed up. 3 weeks lost and no servers to show for it. my boss is so pissed. do you guys have any input? where else can i go to find out about these boards? Thanks, Rob _ Get your FREE download of MSN Explorer at http://explorer.msn.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [eepro100] Ok, I'm fed up now
Whoops, I didnt set the cc address properly, the full thread is on the [EMAIL PROTECTED] list. Karl Pickett _ Get your FREE download of MSN Explorer at http://explorer.msn.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [eepro100] ...
>You could try the Intel driver (e100.c), which is downloadable from their >website. It apparently has some silicon bug workarounds that Donald's >driver hasn't. We've been back and forth with that driver, yeah. It has its own set of problems, sometimes it doesnt even autonegotiate properly, falls to 10 half duplex, etc. It also seems to have gotten quite large code wise (code wise) in the latest version (1.3.x.something) :)We dont need any of their ans/teaming/proc stuff, but it is stable on this particular problem. Their driver doesnt even compile out of the box on 2.2.18 btw (intel e100.c 1.3.2, latest I could find), _badudelay. I had to change one call to mdelay, and comment out their dma_addr_t type because of the conflicting declaration. Doesnt give me much confidence :( >Also please note that such a subject line is not a good motivation to help >you for free. >-Andi Sorry. It wasnt my subject though, I was replying to someone else and just put the 'Re:' in. But I will be more careful... _ Get your FREE download of MSN Explorer at http://explorer.msn.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/