Re: NETDEV WATCHDOG, tulip, 2.6.18

2007-04-19 Thread Lou Poppler

On Thu, 19 Apr 2007, Tomasz Chmielewski wrote:


I also have recurrent problems with
NETDEV WATCHDOG: eth0: transmit timed out


If you search the list, you'll find several similar reports about the tulip 
driver (NETDEV WATCHDOG: eth0: transmit timed out).


Adding nopaic/nolapic/noacpi options to the kernel command line helped in my 
case.


Yes, I've looked at many similar complaints on this list and on Debian's
BTS, going back several years.  It seems that it happens with more drivers
than just tulip, all with basicly the same symptoms.  The pattern seems to
be: (1) a period of problem-free operation, minutes to days duration;
(2) something bad happens, probably APIC or APCI related, probably
interfering with seeing an interrupt; (3) the kernel doesn't ever
completely recover from the bad event, and the affected interface
remains crippled until reboot.

I will continue to experiment with noapic,nolapic,acpi=off,pci=routeirq
plus variations of BIOS setup options.  This is slow experimentation,
since it can take days to see the problem -- but luckily this machine
is not critical to anything now, and I don't mind taking some time
to try to figure this out.  When I find some settings that avoid the
problem, I think this will mean we have avoided the "something bad"
in step (2) above, possibly by disabling some functionality of the
hardware.  The possibility remains that some part of the kernel could
be coping better with whatever this is, and recovering the normal
operation of the interface after the "something bad".  I think I
remember other peoples' NETDEV WATCHDOG trouble reports where they
say their setup used to work OK under some earlier 2.4 (or 2.2 ?)
kernels, or even under other OSes.

Any other suggestions for boot options I should try are welcome,
as are any requests for other specific info about this system
while it is still in failure mode.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NETDEV WATCHDOG, tulip, 2.6.18

2007-04-19 Thread Tomasz Chmielewski

> I also have recurrent problems with
> NETDEV WATCHDOG: eth0: transmit timed out

I remember having it with some older kernels on Fujitsu-Siemens Scenic 
machines.


If you search the list, you'll find several similar reports about the 
tulip driver (NETDEV WATCHDOG: eth0: transmit timed out).



Adding nopaic/nolapic/noacpi options to the kernel command line helped 
in my case.



--
Tomasz Chmielewski
http://wpkg.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NETDEV WATCHDOG, tulip, 2.6.18

2007-04-19 Thread Tomasz Chmielewski

 I also have recurrent problems with
 NETDEV WATCHDOG: eth0: transmit timed out

I remember having it with some older kernels on Fujitsu-Siemens Scenic 
machines.


If you search the list, you'll find several similar reports about the 
tulip driver (NETDEV WATCHDOG: eth0: transmit timed out).



Adding nopaic/nolapic/noacpi options to the kernel command line helped 
in my case.



--
Tomasz Chmielewski
http://wpkg.org
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NETDEV WATCHDOG, tulip, 2.6.18

2007-04-19 Thread Lou Poppler

On Thu, 19 Apr 2007, Tomasz Chmielewski wrote:


I also have recurrent problems with
NETDEV WATCHDOG: eth0: transmit timed out


If you search the list, you'll find several similar reports about the tulip 
driver (NETDEV WATCHDOG: eth0: transmit timed out).


Adding nopaic/nolapic/noacpi options to the kernel command line helped in my 
case.


Yes, I've looked at many similar complaints on this list and on Debian's
BTS, going back several years.  It seems that it happens with more drivers
than just tulip, all with basicly the same symptoms.  The pattern seems to
be: (1) a period of problem-free operation, minutes to days duration;
(2) something bad happens, probably APIC or APCI related, probably
interfering with seeing an interrupt; (3) the kernel doesn't ever
completely recover from the bad event, and the affected interface
remains crippled until reboot.

I will continue to experiment with noapic,nolapic,acpi=off,pci=routeirq
plus variations of BIOS setup options.  This is slow experimentation,
since it can take days to see the problem -- but luckily this machine
is not critical to anything now, and I don't mind taking some time
to try to figure this out.  When I find some settings that avoid the
problem, I think this will mean we have avoided the something bad
in step (2) above, possibly by disabling some functionality of the
hardware.  The possibility remains that some part of the kernel could
be coping better with whatever this is, and recovering the normal
operation of the interface after the something bad.  I think I
remember other peoples' NETDEV WATCHDOG trouble reports where they
say their setup used to work OK under some earlier 2.4 (or 2.2 ?)
kernels, or even under other OSes.

Any other suggestions for boot options I should try are welcome,
as are any requests for other specific info about this system
while it is still in failure mode.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


NETDEV WATCHDOG, tulip, 2.6.18

2007-04-18 Thread Lou Poppler

Package: linux-kernel
Version: 2.6.18-4-686 (Debian 2.6.18.dfsg.1-12)

(Submitted to linux-kernel@vger.kernel.org && [EMAIL PROTECTED])

I also have recurrent problems with
NETDEV WATCHDOG: eth0: transmit timed out

I am running on a Pentium 3 with a Linksys LNE100TX V5.1
PCI ethernet card, which also identifies itself as  ADMtek Comet rev 17
for which the kernel uses the tulip driver module,
Linux Tulip driver version 1.1.13-NAPI (May 11, 2002)

This works fine after booting, and for a day or two after booting,
no problems with heavy net traffic or light traffic.
Eventually something happens to it though, and then it is not right again
until reboot.  The behavior then is an occasional freeze, where nothing
moves for 10 seconds or so, then full-speed network I/O for a few seconds,
then another freeze, etc.

I only got this machine recently.  I first installed Debian Sarge on it,
and had the same problem with Sarge's 2.6.8 kernel.  I read many messages
about the NETDEV WATCHDOG situation, and some writers suggested it might
be fixed in later kernels, so I upgraded to Etch with the 2.6.18 kernel.
For me at least, the problem is still the same.

I am holding the machine in the broken condition (rather than rebooting)
in case anyone wants me to test something else.

Here is some info to document the problem:

dmesg at boot:
Linux version 2.6.18-4-686 (Debian 2.6.18.dfsg.1-12) ([EMAIL PROTECTED]) (gcc 
version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Mon Mar 26 
17:17:36 UTC 2007
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f800 (usable)
 BIOS-e820: 0009f800 - 000a (reserved)
 BIOS-e820: 000e7000 - 0010 (reserved)
 BIOS-e820: 0010 - 040fd800 (usable)
 BIOS-e820: 040fd800 - 040ff800 (ACPI data)
 BIOS-e820: 040ff800 - 040ffc00 (ACPI NVS)
 BIOS-e820: 040ffc00 - 1800 (usable)
 BIOS-e820: fffe7000 - 0001 (reserved)
0MB HIGHMEM available.
384MB LOWMEM available.
On node 0 totalpages: 98304
  DMA zone: 4096 pages, LIFO batch:0
  Normal zone: 94208 pages, LIFO batch:31
DMI 2.1 present.
ACPI: RSDP (v000 PTLTD ) @ 0x000f6ac0
ACPI: RSDT (v001 PTLTDRSDT   0x PTL  0x0100) @ 0x040fda87
ACPI: FADT (v001 GATEWA TABOR II 0x19990928 PTL  0x000f4240) @ 0x040ff78c
ACPI: DSDT (v001 GATEWA TABOR II 0x MSFT 0x0100) @ 0x
ACPI: PM-Timer IO Port: 0x8008
Allocating PCI resources starting at 2000 (gap: 1800:e7fe7000)
Detected 596.938 MHz processor.
Built 1 zonelists.  Total pages: 98304
Kernel command line: root=/dev/hda2 ro 
Local APIC disabled by BIOS -- you can enable it with "lapic"

mapped APIC to d000 (0130a000)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 382128k/393216k available (1544k kernel code, 10556k reserved, 577k 
data, 196k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 1194.90 BogoMIPS (lpj=2389801)
Security Framework v1.0.0 initialized
SELinux:  Disabled at boot.
Capability LSM initialized
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0383f9ff     
 
CPU: After vendor identify, caps: 0383f9ff     
 
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 512K
CPU: After all inits, caps: 0383f9ff   0040  
 
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to e000.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 16k freed
ACPI: Core revision 20060707
ACPI: setting ELCR to 0200 (from 1a00)
CPU0: Intel Pentium III (Katmai) stepping 03
SMP motherboard not detected.
Local APIC not detected. Using dummy APIC emulation.
Brought up 1 CPUs
migration_cost=0
checking if image is initramfs... it is
Freeing initrd memory: 4397k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfd983, last bus=1
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: Interpreter enabled
ACPI: Using PIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (:00)
PCI: Probing PCI hardware (bus 00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
* Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
* this clock source is slow. Consider trying other clock sources
PCI quirk: region 8000-803f claimed by PIIX4 ACPI
PCI quirk: region 7000-700f claimed by PIIX4 SMB
Boot 

NETDEV WATCHDOG, tulip, 2.6.18

2007-04-18 Thread Lou Poppler

Package: linux-kernel
Version: 2.6.18-4-686 (Debian 2.6.18.dfsg.1-12)

(Submitted to linux-kernel@vger.kernel.org  [EMAIL PROTECTED])

I also have recurrent problems with
NETDEV WATCHDOG: eth0: transmit timed out

I am running on a Pentium 3 with a Linksys LNE100TX V5.1
PCI ethernet card, which also identifies itself as  ADMtek Comet rev 17
for which the kernel uses the tulip driver module,
Linux Tulip driver version 1.1.13-NAPI (May 11, 2002)

This works fine after booting, and for a day or two after booting,
no problems with heavy net traffic or light traffic.
Eventually something happens to it though, and then it is not right again
until reboot.  The behavior then is an occasional freeze, where nothing
moves for 10 seconds or so, then full-speed network I/O for a few seconds,
then another freeze, etc.

I only got this machine recently.  I first installed Debian Sarge on it,
and had the same problem with Sarge's 2.6.8 kernel.  I read many messages
about the NETDEV WATCHDOG situation, and some writers suggested it might
be fixed in later kernels, so I upgraded to Etch with the 2.6.18 kernel.
For me at least, the problem is still the same.

I am holding the machine in the broken condition (rather than rebooting)
in case anyone wants me to test something else.

Here is some info to document the problem:

dmesg at boot:
Linux version 2.6.18-4-686 (Debian 2.6.18.dfsg.1-12) ([EMAIL PROTECTED]) (gcc 
version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Mon Mar 26 
17:17:36 UTC 2007
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f800 (usable)
 BIOS-e820: 0009f800 - 000a (reserved)
 BIOS-e820: 000e7000 - 0010 (reserved)
 BIOS-e820: 0010 - 040fd800 (usable)
 BIOS-e820: 040fd800 - 040ff800 (ACPI data)
 BIOS-e820: 040ff800 - 040ffc00 (ACPI NVS)
 BIOS-e820: 040ffc00 - 1800 (usable)
 BIOS-e820: fffe7000 - 0001 (reserved)
0MB HIGHMEM available.
384MB LOWMEM available.
On node 0 totalpages: 98304
  DMA zone: 4096 pages, LIFO batch:0
  Normal zone: 94208 pages, LIFO batch:31
DMI 2.1 present.
ACPI: RSDP (v000 PTLTD ) @ 0x000f6ac0
ACPI: RSDT (v001 PTLTDRSDT   0x PTL  0x0100) @ 0x040fda87
ACPI: FADT (v001 GATEWA TABOR II 0x19990928 PTL  0x000f4240) @ 0x040ff78c
ACPI: DSDT (v001 GATEWA TABOR II 0x MSFT 0x0100) @ 0x
ACPI: PM-Timer IO Port: 0x8008
Allocating PCI resources starting at 2000 (gap: 1800:e7fe7000)
Detected 596.938 MHz processor.
Built 1 zonelists.  Total pages: 98304
Kernel command line: root=/dev/hda2 ro 
Local APIC disabled by BIOS -- you can enable it with lapic

mapped APIC to d000 (0130a000)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 382128k/393216k available (1544k kernel code, 10556k reserved, 577k 
data, 196k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 1194.90 BogoMIPS (lpj=2389801)
Security Framework v1.0.0 initialized
SELinux:  Disabled at boot.
Capability LSM initialized
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0383f9ff     
 
CPU: After vendor identify, caps: 0383f9ff     
 
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 512K
CPU: After all inits, caps: 0383f9ff   0040  
 
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to e000.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 16k freed
ACPI: Core revision 20060707
ACPI: setting ELCR to 0200 (from 1a00)
CPU0: Intel Pentium III (Katmai) stepping 03
SMP motherboard not detected.
Local APIC not detected. Using dummy APIC emulation.
Brought up 1 CPUs
migration_cost=0
checking if image is initramfs... it is
Freeing initrd memory: 4397k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfd983, last bus=1
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: Interpreter enabled
ACPI: Using PIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (:00)
PCI: Probing PCI hardware (bus 00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
* Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
* this clock source is slow. Consider trying other clock sources
PCI quirk: region 8000-803f claimed by PIIX4 ACPI
PCI quirk: region 7000-700f claimed by PIIX4 SMB
Boot