2.2.18 eepro100 hangs (cmd_wait for(0xffffff80) timedout with(0xffffff80)!)

2001-01-04 Thread Kambo Lohan

Hi,

600mhz celery 2.2.18 vanilla, intel 815EE mobo, onboard eepro100 and pci 
3com 905b.  Happens when traffic is high to the eepro.

FWIW, the 3c905 (3c59x module) also reports 'too much work in interrupt' but 
it did not hang like the eepro100, and by adding max_interrupt_work=40 with 
modprobe, that also solved the warnings. But the intel card actually hangs 
after it shows those warnings, so we thought it was a problem.

messages:Jan  4 09:01:44 axiom1 kernel: eepro100.c:v1.09j-t 9/29/99 Donald 
Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
Jan  4 09:01:44 axiom1 kernel: eepro100.c: $Revision: 1.20.2.10 $ 2000/05/31 
Modified by Andrey V. Savochkin <[EMAIL PROTECTED]> and others
Jan  4 09:01:44 axiom1 kernel: eepro100.c: VA Linux custom, Dragan Stancevic 
<[EMAIL PROTECTED]> 2000/11/15
Jan  4 09:01:44 axiom1 kernel: eth0: Intel PCI EtherExpress Pro100 82562EM, 
00:03:47:0A:C0:97, IRQ 11.
Jan  4 09:01:44 axiom1 kernel:   Receiver lock-up bug exists -- enabling 
work-around.
Jan  4 09:01:44 axiom1 kernel:   Board assembly 00-000, Physical 
connectors present: RJ45
Jan  4 09:01:44 axiom1 kernel:   Primary interface chip i82555 PHY #1.
Jan  4 09:01:44 axiom1 kernel:   General self-test: passed.
Jan  4 09:01:44 axiom1 kernel:   Serial sub-system self-test: passed.
Jan  4 09:01:44 axiom1 kernel:   Internal registers self-test: passed.
Jan  4 09:01:44 axiom1 kernel:   ROM checksum self-test: passed 
(0x04f4518b).
Jan  4 09:01:44 axiom1 kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker 
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
Jan  4 09:01:44 axiom1 kernel: eepro100.c: $Revision: 1.20.2.10 $ 2000/05/31 
Modified by Andrey V. Savochkin <[EMAIL PROTECTED]> and others
Jan  4 09:01:44 axiom1 kernel: eepro100.c: VA Linux custom, Dragan Stancevic 
<[EMAIL PROTECTED]> 2000/11/15
...
Jan  4 17:14:03 axiom1 kernel: eth1: Too much work in interrupt, status 
e401.
Jan  4 17:14:42 axiom1 kernel: eth1: Too much work in interrupt, status 
e481.
Jan  4 17:16:36 axiom1 kernel: eepro100: cmd_wait for(0xff80) timedout 
with(0xff80)!
Jan  4 17:17:09 axiom1 last message repeated 18 times
Jan  4 17:18:49 axiom1 last message repeated 8 times
Jan  4 17:20:49 axiom1 last message repeated 3 times
Jan  4 17:22:49 axiom1 last message repeated 3 times
Jan  4 17:24:49 axiom1 last message repeated 3 times
Jan  4 17:26:49 axiom1 last message repeated 3 times
Jan  4 17:28:49 axiom1 last message repeated 3 times
Jan  4 17:30:49 axiom1 last message repeated 3 times
Jan  4 17:32:49 axiom1 last message repeated 3 times
Jan  4 17:59:28 axiom1 last message repeated 3 times
Jan  4 17:59:42 axiom1 last message repeated 7 times
Jan  4 17:59:50 axiom1 kernel: eth0: Transmit timed out: status 0050  0c80 
at 13421/13481 command 000c.
Jan  4 17:59:50 axiom1 kernel: eepro100: cmd_wait for(0xff80) timedout 
with(0xff80)!


lspci -v:00:00.0 Host bridge: Intel Corporation: Unknown device 1130 (rev 
02)
Flags: bus master, fast devsel, latency 0
Capabilities: [88] #09 [f104]

00:02.0 VGA compatible controller: Intel Corporation: Unknown device 1132 
(rev 02) (prog-if 00 [VGA])
Subsystem: Intel Corporation: Unknown device 4541
Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 11
Memory at f800 (32-bit, prefetchable)
Memory at ffa8 (32-bit, non-prefetchable)
Capabilities: [dc] Power Management version 2

00:1e.0 PCI bridge: Intel Corporation: Unknown device 244e (rev 01) (prog-if 
00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: d000-dfff
Memory behind bridge: ff80-ff8f
Prefetchable memory behind bridge: f6a0-f6af

00:1f.0 ISA bridge: Intel Corporation: Unknown device 2440 (rev 01)
Flags: bus master, medium devsel, latency 0

00:1f.1 IDE interface: Intel Corporation: Unknown device 244b (rev 01) 
(prog-if 80 [Master])
Subsystem: Intel Corporation: Unknown device 4541
Flags: bus master, medium devsel, latency 0
I/O ports at ffa0

00:1f.2 USB Controller: Intel Corporation: Unknown device 2442 (rev 01) 
(prog-if 00 [UHCI])
Subsystem: Intel Corporation: Unknown device 4541
Flags: bus master, medium devsel, latency 0, IRQ 10
I/O ports at ef40

00:1f.3 SMBus: Intel Corporation: Unknown device 2443 (rev 01)
Subsystem: Intel Corporation: Unknown device 4541
Flags: medium devsel, IRQ 6
I/O ports at efa0

00:1f.4 USB Controller: Intel Corporation: Unknown device 2444 (rev 01) 
(prog-if 00 [UHCI])
Subsystem: Intel Corporation: Unknown device 4541
Flags: bus master, medium devsel, latency 0, IRQ 9
I/O ports at ef80

00:1f.5 Multimedia audio controller: Intel Corporation: Unknown device 2445 
(rev 01)
Subsystem: Intel Corporation: Unknown device 4541
Flags: bus master, medium 

Re: [eepro100] Ok, I'm fed up now

2001-01-09 Thread Kambo Lohan

I am having the same problems, I have duplicated the hard lockups / ethernet 
hangs on two intel 815EE boards.  It happens when send traffic through the 
onboard eepro100 is high, and sometimes running something like vmstat 1 in 
the background triggers the lockup faster.  When it locks up there is 
nothing in the log, no oops or anything.  Sometimes it just hangs eth0 with 
the (cmd timeout) msgs and an ifconfig down/up fixes it temporarily.

I am using 2.2.18, heres compile info:
Jan  9 08:29:32 axiom1 kernel: Linux version 2.2.18 (root@flankerbuild) (gcc 
version egcs-2.91.66 19990Jan  9 08:29:32 axiom1 kernel: Detected 598065 kHz 
processor.
Jan  9 08:29:32 axiom1 kernel: Console: colour VGA+ 80x25
Jan  9 08:29:32 axiom1 kernel: Calibrating delay loop... 1192.75 BogoMIPS
Jan  9 08:29:32 axiom1 kernel: Memory: 126976k/129792k available (884k 
kernel code, 412k reserved, 1480Jan  9 08:29:32 axiom1 kernel: Dentry hash 
table entries: 16384 (order 5, 128k)
Jan  9 08:29:32 axiom1 kernel: Buffer cache hash table entries: 131072 
(order 7, 512k)
Jan  9 08:29:32 axiom1 kernel: Page cache hash table entries: 32768 (order 
5, 128k)
Jan  9 08:29:32 axiom1 kernel: Intel machine check architecture supported.
Jan  9 08:29:32 axiom1 kernel: Intel machine check reporting enabled on 
CPU#0.
Jan  9 08:29:32 axiom1 kernel: 128K L2 cache (4 way)
Jan  9 08:29:32 axiom1 kernel: CPU: L2 Cache: 128K
Jan  9 08:29:32 axiom1 kernel: CPU: Intel Pentium III (Coppermine) stepping 
06
Jan  9 08:29:32 axiom1 kernel: Checking 386/387 coupling... OK, FPU using 
exception 16 error reporting.Jan  9 08:29:32 axiom1 kernel: Checking 'hlt' 
instruction... OK.
Jan  9 08:29:32 axiom1 kernel: POSIX conformance testing by UNIFIX
Jan  9 08:29:32 axiom1 kernel: PCI: PCI BIOS revision 2.10 entry at 0xfda95
Jan  9 08:29:32 axiom1 kernel: PCI: Using configuration type 1
Jan  9 08:29:32 axiom1 kernel: PCI: Probing PCI hardware
Jan  9 08:29:32 axiom1 kernel: Linux NET4.0 for Linux 2.2

I will test the latest 2.2.19pre with it.  Do you need my kernel config, 
it's all generic, i have eepro100 and 3c59x built as modules everything else 
is default (smp is OFF).

I can trigger this lockup repeatedly by doing a shell script on the 815 that 
rsyncs a large file to another machine over and over.  After about 5 minutes 
of high load it either hangs hard or hangs the eth card.

I have a 3com59x in there as well, it and the onboard eepro are both using 
irq 11 - is that normal?  The 3com is in promiscuous mode sniffing a lot of 
traffic as well, during the tests.

-- original text --
Ok, I just build another server, also using an Intel D815EAL. Same exact
setup as the other one. Celeron 600cpu w/ 128mb ram (PC133 btw). both
systems lockup doing ftp transfers. I didn't know for sure about the first
one, so I just opened up an ncftp session and downloaded something large
from the internet. Netscape 6 was what I found (not planning on installing
it btw), didn't get very far then locked up tight. twice in a row. no
errors, nothing, nothing to report either.

So, I built another. i have parts for 2 of them (almost ready for the
garbage) so i put together the second. i can hardly download anything.
can't even get down new rpm or source code for the new kernels. i'm on my
last nerve here and about to throw the motherboards out the window. I don't
understand how you guys make these motherboards work. what am i missing?
i'm so fed up. 3 weeks lost and no servers to show for it. my boss is so
pissed.

do you guys have any input? where else can i go to find out about these
boards?

Thanks,
Rob

_
Get your FREE download of MSN Explorer at http://explorer.msn.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [eepro100] Ok, I'm fed up now

2001-01-09 Thread Kambo Lohan

Whoops,
I didnt set the cc address properly, the full thread is on the 
[EMAIL PROTECTED] list.

Karl Pickett

_
Get your FREE download of MSN Explorer at http://explorer.msn.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [eepro100] ...

2001-01-09 Thread Kambo Lohan

>You could try the Intel driver (e100.c), which is downloadable from their 
>website. It apparently has some silicon bug workarounds that Donald's 
>driver hasn't.

We've been back and forth with that driver, yeah.  It has its own set of 
problems, sometimes it doesnt even autonegotiate properly, falls to 10 half 
duplex, etc.  It also seems to have gotten quite large code wise (code wise) 
in the latest version (1.3.x.something) :)We dont need any of their 
ans/teaming/proc stuff, but it is stable on this particular problem.

Their driver doesnt even compile out of the box on 2.2.18 btw (intel e100.c 
1.3.2, latest I could find), _badudelay.  I had to change one call to 
mdelay, and comment out their dma_addr_t type because of the conflicting 
declaration.  Doesnt give me much confidence :(

>Also please note that such a subject line is not a good motivation to help 
>you for free.
>-Andi

Sorry.  It wasnt my subject though, I was replying to someone else and just 
put the 'Re:' in.  But I will be more careful...

_
Get your FREE download of MSN Explorer at http://explorer.msn.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/