Re: Intel 82559 NIC corrupted EEPROM

2007-02-07 Thread John

Jesse Brandeburg wrote:


John wrote:


Jesse Brandeburg wrote:


can you try adding mdelay(100); in e100_eeprom_load before the for loop,
and then change the multiple udelay(4) to mdelay(1) in e100_eeprom_read


I applied the attached patch.

Loading the driver now takes around one minute :-)


ouch, but yep, thats what happens when you use super extra delay


I ran 'source load_unload' 25 times in a loop.

The first 12 times were successful. The last 13 times failed.
(cf. attached archive)

I noticed something very strange.

The number of words obviously in error (0x) returned by the EEPROM
on 00:09.0 is not constant.


That is very strange, I would think that maybe you have something else
on the bus with the e100 that may be hogging bus cycles you have
failing hardware (maybe a bad eeprom, or possibly a bad mac chip)


$ grep -c 0x insmod*
insmod_300.txt:0
insmod_301.txt:0
insmod_302.txt:0
insmod_303.txt:0
insmod_304.txt:0
insmod_305.txt:0
insmod_306.txt:0
insmod_307.txt:0
insmod_308.txt:0
insmod_309.txt:0
insmod_310.txt:0
insmod_311.txt:0
insmod_312.txt:1
insmod_313.txt:5
insmod_314.txt:24
insmod_315.txt:45
insmod_316.txt:243
insmod_317.txt:256
insmod_318.txt:256
insmod_319.txt:256
insmod_320.txt:256
insmod_321.txt:256
insmod_322.txt:256
insmod_323.txt:253
insmod_324.txt:240


this is even stranger, does it cycle back down (sine wave) to zero
again?  The delays did seem to work, at least sometimes.  This
indicates that something needs that extra delay to successfully read
the eeprom.  I might try changing all the udelay(4) to udelay(40) (x10
increase) and see if that gives you a happy medium of most times
driver loads without error

John, this problem seems to be very specific to your hardware.  I know
that you have put in a lot of time debugging this, but I'm not sure
what we can do from here.  If this were a generic code problem more
people would be reporting the issue.

What would you like to do?  At this stage I would like e100 to work
better than it is, but I'm not sure what to do next.


Hello everyone,

I'm resurrecting this thread because it appears we'll need to support 
these motherboards for several months to come, yet Adrian Bunk has 
scheduled the removal of eepro100 in January 2007.


To recap, we have to support ~30 EBC-2000T motherboards.
http://www.adlinktech.com/PD/web/PD_detail.php?pid=213
These motherboards come with three on-board Intel 82559 NICs.

Last time I checked, i.e. two months ago, e100 did not correctly 
initialize all three NICs on these motherboards. Therefore, we've been 
using eepro100.


I will be testing the latest 2.6.20 kernel to see if the situation has 
changed, but I wanted to let you all know that there are still some 
eepro100 users out there, out of necessity.


Regards,

John

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Intel 82559 NIC corrupted EEPROM

2006-12-04 Thread Jesse Brandeburg

On 12/1/06, John [EMAIL PROTECTED] wrote:

 can you try adding mdelay(100); in e100_eeprom_load before the for loop,
 and then change the multiple udelay(4) to mdelay(1) in e100_eeprom_read

I applied the attached patch.

Loading the driver now takes around one minute :-)


ouch, but yep, thats what happens when you use super extra delay


I ran 'source load_unload' 25 times in a loop.

The first 12 times were successful. The last 13 times failed.
(cf. attached archive)

I noticed something very strange.

The number of words obviously in error (0x) returned by the EEPROM
on 00:09.0 is not constant.


That is very strange, I would think that maybe you have something else
on the bus with the e100 that may be hogging bus cycles you have
failing hardware (maybe a bad eeprom, or possibly a bad mac chip)


$ grep -c 0x insmod*
insmod_300.txt:0
insmod_301.txt:0
insmod_302.txt:0
insmod_303.txt:0
insmod_304.txt:0
insmod_305.txt:0
insmod_306.txt:0
insmod_307.txt:0
insmod_308.txt:0
insmod_309.txt:0
insmod_310.txt:0
insmod_311.txt:0
insmod_312.txt:1
insmod_313.txt:5
insmod_314.txt:24
insmod_315.txt:45
insmod_316.txt:243
insmod_317.txt:256
insmod_318.txt:256
insmod_319.txt:256
insmod_320.txt:256
insmod_321.txt:256
insmod_322.txt:256
insmod_323.txt:253
insmod_324.txt:240


this is even stranger, does it cycle back down (sine wave) to zero
again?  The delays did seem to work, at least sometimes.  This
indicates that something needs that extra delay to successfully read
the eeprom.  I might try changing all the udelay(4) to udelay(40) (x10
increase) and see if that gives you a happy medium of most times
driver loads without error

John, this problem seems to be very specific to your hardware.  I know
that you have put in a lot of time debugging this, but I'm not sure
what we can do from here.  If this were a generic code problem more
people would be reporting the issue.

What would you like to do?  At this stage I would like e100 to work
better than it is, but I'm not sure what to do next.

Thanks for your patience on this issue,
 Jesse
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Intel 82559 NIC corrupted EEPROM

2006-11-29 Thread John

Jesse Brandeburg wrote:


John wrote:


Here is some context for those who have been added to the CC list:
http://groups.google.com/group/linux.kernel/browse_frm/thread/bdc8fd08fb601c26

As far as I understand, some consider the eepro100 driver to be
obsolete, and it has been considered for removal.

What is the current status?

Unfortunately, e100 does not work out-of-the-box on this system.

Is there something I can do to improve the situation?


Let's go ahead and print the output from e100_load_eeprom
debug patch attached.


Loading (then unloading) e100.ko fails the first few times (i.e. the 
driver claims one of the EEPROMs is corrupted). Thereafter, sometimes it 
fails, other times it works. Sounds like a race, no?


$ cat load_unload
:  /var/log/kern.log
insmod e100.ko debug=16
sleep 1
cp /var/log/kern.log insmod_$I.txt
ip link  ip_link_$I.txt
sleep 2
rmmod e100
let I=I+1

(cf. attached compressed archive)

FAILURE:
insmod_100.txt
insmod_101.txt
insmod_102.txt
insmod_105.txt
insmod_107.txt
insmod_108.txt
insmod_110.txt
insmod_111.txt
insmod_114.txt

SUCCESS:
insmod_103.txt
insmod_104.txt
insmod_106.txt
insmod_109.txt
insmod_112.txt
insmod_113.txt
insmod_115.txt
insmod_116.txt

On an unrelated note, insmod_100.txt is truncated at the beginning, and 
insmod_110.txt is truncated in the middle (!!) cf. line 14. What would 
cause klogd to behave like that?


Regards.


TEST-e100.tar.bz2
Description: Binary data


Re: Intel 82559 NIC corrupted EEPROM

2006-11-10 Thread John

Jesse Brandeburg wrote:


Can you send output of cat /proc/iomem


-0009 : System RAM
000a-000b : Video RAM area
000f-000f : System ROM
0010-0ffe : System RAM
  0010-00296a1a : Kernel code
  00296a1b-0031bbe7 : Kernel data
0fff-0fff2fff : ACPI Non-volatile Storage
0fff3000-0fff : ACPI Tables
2000-200f : :00:08.0
2010-201f : :00:09.0
2020-202f : :00:0a.0
e000-e3ff : :00:00.0
e500-e50f : :00:08.0
e510-e51f : :00:09.0
e520-e52f : :00:0a.0
e530-e5300fff : :00:08.0
e5301000-e5301fff : :00:0a.0
e5302000-e5302fff : :00:09.0
- : reserved

I've also attached:

o config-2.6.18.1-adlink used to compile this kernel
o dmesg output after the machine boots


try something like the attached patch


Loading e100-debug.ko reports:

e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation

***e100 debug: unable to set power state (error 0)
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12
PCI: setting IRQ 12 as level-triggered
ACPI: PCI Interrupt :00:08.0[A] - Link [LNKA]
 - GSI 12 (level, low) - IRQ 12
***e100 debug: read 0100/ from the same register
e100: eth0: e100_probe: addr 0xe530, irq 12, MAC addr 00:30:64:04:E6:E4

***e100 debug: unable to set power state (error 0)
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt :00:09.0[A] - Link [LNKB]
 - GSI 10 (level, low) - IRQ 10
***e100 debug: read 0100/ from the same register
e100: :00:09.0: e100_eeprom_load: EEPROM corrupted
ACPI: PCI interrupt for device :00:09.0 disabled
e100: probe of :00:09.0 failed with error -11

***e100 debug: unable to set power state (error 0)
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
ACPI: PCI Interrupt :00:0a.0[A] - Link [LNKC]
 - GSI 11 (level, low) - IRQ 11
***e100 debug: read 0100/ from the same register
e100: eth1: e100_probe: addr 0xe5301000, irq 11, MAC addr 00:30:64:04:E6:E6


In other words, the behavior is the same for all three NICs.

pci_set_power_state(pdev, PCI_D0) returns 0
pci_iomap returns something != NULL

Can I provide more information to help locate the problem?
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.18.1-hrt
# Tue Nov  7 17:52:26 2006
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE=
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_KMOD is not set

#
# Block layer
#
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED=cfq

#
# Processor type and features
#
# CONFIG_HIGH_RES_TIMERS is not set
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
CONFIG_MPENTIUMIII=y
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not 

Re: Intel 82559 NIC corrupted EEPROM

2006-11-09 Thread John

Auke Kok wrote:

This is what I was afraid of: even though the code allows you to bypass 
the EEPROM checksum, the probe fails on a further check to see if the 
MAC address is valid.


Since something with this NIC specifically made the EEPROM return all 
0xff's, the MAC address is automatically invalid, and thus probe fails.


I don't understand why you think there is something wrong with a
specific NIC?

In 2.6.14.7, e100.ko fails to read the EEPROM on :00:08.0 (eth0)
In 2.6.18.1, e100.ko fails to read the EEPROM on :00:09.0 (eth1)
In both kernels, eepro100.ko successfully reads all the EEPROMs.

It seems that the driver has more problems with this NIC than just the 
eeprom checksum being bad. Needless to say this might need fixing.


Can you load the eepro driver and send me the full eeprom dump?
Perhaps I can duplicate things over here.


00:08.0 EEPROM contents, size 64x16

  3000 0464 e4e6 0e03  0201 4701 
  7213 8310 40a2 0001 8086   
         
         
         
         
  0128       
         92f7

00:09.0 EEPROM contents, size 64x16

  3000 0464 e5e6 0e03  0201 4701 
  7213 8310 40a2 0001 8086   
         
         
         
         
  0128       
         91f7

00:0a.0 EEPROM contents, size 64x16

  3000 0464 e6e6 0e03  0201 4701 
  7213 8310 40a2 0001 8086   
         
         
         
         
  0128       
         90f7
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Intel 82559 NIC corrupted EEPROM

2006-11-09 Thread John

Jesse Brandeburg wrote:


I suspect that one reason Becker's code works is that it uses IO
based access (slower, and different method) to the adapter rather
than memory mapped access.


I've noticed this difference.


The second thought is that the adapter is in D3, and something about
your kernel or the driver doesn't successfully wake it up to D0.


On my NICs, the EEPROM ID (Word 0Ah) is set to 0x40a2.
Thus DDPD (bit 6) is set to 0.

DDPD is the Disable Deep Power Down while PME is disabled bit.
0 - Deep Power Down is enabled in D3 state while PME-disabled.
1 - Deep Power Down disabled in D3 state while PME-disabled.
This bit should be set to 1b if a TCO controller is being used via the 
SMB because it requires receive functionality at all power states.


Are you suggesting I try and set DDPD to 1?
Or is this completely unrelated?


An indication of this would be looking at lspci -vv before/after
loading the driver.


$ diff -u lspci_vv_before_e100.txt lspci_vv_after_e100.txt
--- lspci_vv_before_e100.txt2006-11-09 14:51:30.0 +0100
+++ lspci_vv_after_e100.txt 2006-11-09 14:51:30.0 +0100
@@ -74,21 +74,20 @@
Expansion ROM at 2000 [disabled] [size=1M]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold+)

-   Status: D0 PME-Enable+ DSel=0 DScale=2 PME-
+   Status: D0 PME-Enable- DSel=0 DScale=2 PME-

 00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 
100] (rev 08)

Subsystem: Intel Corporation EtherExpress PRO/100B (TX)
-   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
+   Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium 
TAbort- TAbort- MAbort- SERR- PERR-

-   Latency: 32 (2000ns min, 14000ns max), cache line size 08
Interrupt: pin A routed to IRQ 10
-   Region 0: Memory at e5302000 (32-bit, non-prefetchable) [size=4K]
-   Region 1: I/O ports at dc00 [size=64]
-   Region 2: Memory at e510 (32-bit, non-prefetchable) [size=1M]
+   Region 0: Memory at e5302000 (32-bit, non-prefetchable) 
[disabled] [size=4K]

+   Region 1: I/O ports at dc00 [disabled] [size=64]
+   Region 2: Memory at e510 (32-bit, non-prefetchable) 
[disabled] [size=1M]

Expansion ROM at 2010 [disabled] [size=1M]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold+)

-   Status: D0 PME-Enable+ DSel=0 DScale=2 PME-
+   Status: D0 PME-Enable- DSel=0 DScale=2 PME-

 00:0a.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 
100] (rev 08)

Subsystem: Intel Corporation EtherExpress PRO/100B (TX)


Also, after loading/unloading eepro100 does the e100 driver work?


No.


A third idea is look for a master abort in lspci after e100 fails to
load.


I don't understand that one.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Intel 82559 NIC corrupted EEPROM

2006-11-09 Thread Auke Kok

John wrote:

Auke Kok wrote:

This is what I was afraid of: even though the code allows you to 
bypass the EEPROM checksum, the probe fails on a further check to see 
if the MAC address is valid.


Since something with this NIC specifically made the EEPROM return all 
0xff's, the MAC address is automatically invalid, and thus probe fails.


I don't understand why you think there is something wrong with a
specific NIC?


that was completely not my point - I was merely trying to point out that the original 
problem causes a cascade of error events later on, and bypassing the eeprom check in 
this case didn't help you at all. Something is wrong in the driver, but I don't 
understand yet why it only affects one of the 3 nics in your system.



In 2.6.14.7, e100.ko fails to read the EEPROM on :00:08.0 (eth0)
In 2.6.18.1, e100.ko fails to read the EEPROM on :00:09.0 (eth1)


almost sounds like a bug got fixed and it introduced a regression. this wouldn't be the 
right time to pull out git-bisect would it? even loading 2.6.15, 2.6.16, 2.6.17 on it 
would give us some good information.



Cheers,

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Intel 82559 NIC corrupted EEPROM

2006-11-09 Thread Jesse Brandeburg

On 11/9/06, John [EMAIL PROTECTED] wrote:

 The second thought is that the adapter is in D3, and something about
 your kernel or the driver doesn't successfully wake it up to D0.

On my NICs, the EEPROM ID (Word 0Ah) is set to 0x40a2.
Thus DDPD (bit 6) is set to 0.

DDPD is the Disable Deep Power Down while PME is disabled bit.
0 - Deep Power Down is enabled in D3 state while PME-disabled.
1 - Deep Power Down disabled in D3 state while PME-disabled.
This bit should be set to 1b if a TCO controller is being used via the
SMB because it requires receive functionality at all power states.

Are you suggesting I try and set DDPD to 1?
Or is this completely unrelated?


This may be related but I doubt it.  Something is strange about how
memory is being mapped in your system.  whatever is creating the
problem moved when you changed the kernel version.  I'm wondering if
there is a device collision at e5302000.  I'm not convinced at this
point it is e100's fault.

can you send output of cat /proc/iomem


 An indication of this would be looking at lspci -vv before/after
 loading the driver.

$ diff -u lspci_vv_before_e100.txt lspci_vv_after_e100.txt
--- lspci_vv_before_e100.txt2006-11-09 14:51:30.0 +0100
+++ lspci_vv_after_e100.txt 2006-11-09 14:51:30.0 +0100
@@ -74,21 +74,20 @@
 Expansion ROM at 2000 [disabled] [size=1M]
 Capabilities: [dc] Power Management version 2
 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
-   Status: D0 PME-Enable+ DSel=0 DScale=2 PME-
+   Status: D0 PME-Enable- DSel=0 DScale=2 PME-


okay when the driver loads it is clearing PME enable, but not
re-enabling it when it unloads.  That is pretty much expected.


  00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro
100] (rev 08)
 Subsystem: Intel Corporation EtherExpress PRO/100B (TX)
-   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
+   Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
 Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium
 TAbort- TAbort- MAbort- SERR- PERR-


pci_enable_device should be enabling io,mem,busmaster, they are
probably being disabled when the driver errors out of init.  maybe you
should add a call to  pci_set_power_state(dev, PCI_D0); before the
call to e100_reset


 Also, after loading/unloading eepro100 does the e100 driver work?

No.


now that is really odd.


 A third idea is look for a master abort in lspci after e100 fails to
 load.

I don't understand that one.


There isn't one, MAbort+ would be showing in the above lspci output.

The all 0x returns when you read registers is a sure sign the
hardware either isn't at the address specified or is in a power down
state.  The only other option i can think of is that something else is
intercepting memory reads and writes.

try something like the attached patch, compile tested only:


e100_debug.patch
Description: Binary data


Re: Intel 82559 NIC corrupted EEPROM

2006-11-08 Thread John

Hello all,

[ E-mail address is a bit-bucket. I *do* monitor the mailing lists. ]

I will try and summarize the problem as I understand it at this point.

I've written two messages so far:
http://groups.google.com/group/linux.kernel/msg/3a05d819c66474db
http://groups.google.com/group/linux.kernel/msg/391aebbb3dfd6039

And here is a link to the complete thread:
http://lkml.org/lkml/fancy/2006/11/3/124

I have a motherboard with three on-board 82559 NICs.

 o eepro100.ko properly initializes all three NICs
 o e100.ko fails to initialize one of them

NOTE: With kernel 2.6.14, e100.ko fails to initialize the NIC with MAC 
address 00:30:64:04:E6:E4. With kernel 2.6.18 e100.ko fails to 
initialize the NIC with MAC address 00:30:64:04:E6:E5.


The problem is not an incorrect checksum. (Donald Becker's dump utility 
reports a correct checksum for all three NICs.) The problem seems to be 
that e100.ko fails to read the contents of one of the EEPROMs.


Auke wrote:

How did you do the first `ethtool` eeprom dump? did you have the
`e100` module loaded at that time? Did you use the new `override`
mechanism graciously donated by David M?


These tests were performed on a 2.6.14 kernel. I hacked
e100_eeprom_load() to return 0 even when the checksum
fails. Thus the driver did not refuse to load, and I was
able to use ethtool to dump the contents of the 3 EEPROMs.


Here are additional examples running a 2.6.18.1-hrt kernel.

'insmod e100.ko' reports:

e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12
PCI: setting IRQ 12 as level-triggered
ACPI: PCI Interrupt :00:08.0[A] - Link [LNKA] - GSI 12 (level, 
low) - IRQ 12

e100: eth0: e100_probe: addr 0xe530, irq 12, MAC addr 00:30:64:04:E6:E4
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt :00:09.0[A] - Link [LNKB] - GSI 10 (level, 
low) - IRQ 10

e100: :00:09.0: e100_eeprom_load: EEPROM corrupted
ACPI: PCI interrupt for device :00:09.0 disabled
e100: probe of :00:09.0 failed with error -11
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
ACPI: PCI Interrupt :00:0a.0[A] - Link [LNKC] - GSI 11 (level, 
low) - IRQ 11

e100: eth1: e100_probe: addr 0xe5301000, irq 11, MAC addr 00:30:64:04:E6:E6


'insmod e100.ko eeprom_bad_csum_allow=1' reports:

e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12
PCI: setting IRQ 12 as level-triggered
ACPI: PCI Interrupt :00:08.0[A] - Link [LNKA] - GSI 12 (level, 
low) - IRQ 12

e100: eth0: e100_probe: addr 0xe530, irq 12, MAC addr 00:30:64:04:E6:E4
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt :00:09.0[A] - Link [LNKB] - GSI 10 (level, 
low) - IRQ 10

e100: :00:09.0: e100_eeprom_load: EEPROM corrupted
e100: :00:09.0: e100_probe: Invalid MAC address from EEPROM, aborting.
ACPI: PCI interrupt for device :00:09.0 disabled
e100: probe of :00:09.0 failed with error -11
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
ACPI: PCI Interrupt :00:0a.0[A] - Link [LNKC] - GSI 11 (level, 
low) - IRQ 11

e100: eth1: e100_probe: addr 0xe5301000, irq 11, MAC addr 00:30:64:04:E6:E6


'insmod e100.ko debug=16' reports:

e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12
PCI: setting IRQ 12 as level-triggered
ACPI: PCI Interrupt :00:08.0[A] - Link [LNKA] - GSI 12 (level, 
low) - IRQ 12
e100: :00:08.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x, 
data_out=0x18203000
e100: :00:08.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x, 
data_out=0x18217809
e100: :00:08.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x, 
data_out=0x18217809

e100: :00:08.0: e100_phy_init: phy_addr = 1
e100: :00:08.0: mdio_ctrl: WRITE:addr=0, reg=0, data_in=0x0400, 
data_out=0x14000400
e100: :00:08.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x, 
data_out=0x18203000
e100: :00:08.0: mdio_ctrl: WRITE:addr=1, reg=0, data_in=0x3000, 
data_out=0x14203000
e100: :00:08.0: mdio_ctrl: WRITE:addr=2, reg=0, data_in=0x0400, 
data_out=0x14400400
e100: :00:08.0: mdio_ctrl: WRITE:addr=3, reg=0, data_in=0x0400, 
data_out=0x14600400
e100: :00:08.0: mdio_ctrl: WRITE:addr=4, reg=0, data_in=0x0400, 
data_out=0x14800400
e100: :00:08.0: mdio_ctrl: WRITE:addr=5, reg=0, data_in=0x0400, 
data_out=0x14A00400
e100: :00:08.0: mdio_ctrl: WRITE:addr=6, reg=0, data_in=0x0400, 
data_out=0x14C00400
e100: :00:08.0: mdio_ctrl: WRITE:addr=7, reg=0, data_in=0x0400, 
data_out=0x14E00400
e100: :00:08.0: mdio_ctrl: WRITE:addr=8, reg=0, data_in=0x0400, 
data_out=0x15000400
e100: :00:08.0: mdio_ctrl: 

Re: Intel 82559 NIC corrupted EEPROM

2006-11-08 Thread Auke Kok

John wrote:

I have a motherboard with three on-board 82559 NICs.

 o eepro100.ko properly initializes all three NICs
 o e100.ko fails to initialize one of them

NOTE: With kernel 2.6.14, e100.ko fails to initialize the NIC with MAC 
address 00:30:64:04:E6:E4. With kernel 2.6.18 e100.ko fails to 
initialize the NIC with MAC address 00:30:64:04:E6:E5.


The problem is not an incorrect checksum. (Donald Becker's dump utility 
reports a correct checksum for all three NICs.) The problem seems to be 
that e100.ko fails to read the contents of one of the EEPROMs.


[snip]


'insmod e100.ko eeprom_bad_csum_allow=1' reports:

e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12
PCI: setting IRQ 12 as level-triggered
ACPI: PCI Interrupt :00:08.0[A] - Link [LNKA] - GSI 12 (level, 
low) - IRQ 12

e100: eth0: e100_probe: addr 0xe530, irq 12, MAC addr 00:30:64:04:E6:E4
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt :00:09.0[A] - Link [LNKB] - GSI 10 (level, 
low) - IRQ 10

e100: :00:09.0: e100_eeprom_load: EEPROM corrupted
e100: :00:09.0: e100_probe: Invalid MAC address from EEPROM, aborting.
ACPI: PCI interrupt for device :00:09.0 disabled
e100: probe of :00:09.0 failed with error -11


This is what I was afraid of: even though the code allows you to bypass the EEPROM 
checksum, the probe fails on a further check to see if the MAC address is valid.


Since something with this NIC specifically made the EEPROM return all 0xff's, the MAC 
address is automatically invalid, and thus probe fails.


It seems that the driver has more problems with this NIC than just the eeprom checksum 
being bad. Needless to say this might need fixing.


Can you load the eepro driver and send me the full eeprom dump? Perhaps I can duplicate 
things over here.


[snip]


On a related note, I am concerned by this message:

   Sleep mode is enabled.  This is not recommended.
   Under high load the card may not respond to
   PCI requests, and thus cause a master abort.
   To clear sleep mode use the '-G 0 -w -w -f' options.

Intel 82559 EEPROM Map and Programming Information (AP-394) states:
http://www.intel.com/design/network/applnots/ap394.htm

The Standby Enable bit enables the 82559 to enter standby mode. When 
this bit equals 1b, the 82559 is able to recognize an idle state and can 
enter standby mode (some internal clocks are stopped for power saving 
purposes). The 82559 does not require a PCI clock signal in standby 
mode. If this bit equals 0b, the idle recognition circuit is disabled 
and the 82559 always remains in an active state. Thus, the 82559 always 
requests PCI CLK using the Clockrun mechanism.


Auke, do you agree with Donald Becker's warning?


If you are using the e100 in a performance situation, I would certainly switch 
it off :)


If I disable STB, the NICs will waste a bit more power when idle,
is that correct? Are there other implications?


hm, I don't know the power specs of e100 that well, so I can't say that it saves 
significant amounts of power, but I suspect it would.


Power management on nics is hairy business. As suggested, it can take time before the 
nic powers back up, performance can be impacted, and some commands might return an 
invalid or unknown value. OTOH our labs here test these things pretty well before they 
get send out to customers and resales agents, so Beckers cautious wording catches the 
severity pretty well (recommended).


I would say that under most circumstances, it's safe to enable STB, but you might want 
to disable it for use in routing and other server applications, where most of the time 
the NIC is active anyway.


hth

Auke




Thanks for reading this far!

John

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Intel 82559 NIC corrupted EEPROM

2006-11-08 Thread Jesse Brandeburg

On 11/8/06, John [EMAIL PROTECTED] wrote:

Hello all,

[ E-mail address is a bit-bucket. I *do* monitor the mailing lists. ]

I will try and summarize the problem as I understand it at this point.

I've written two messages so far:
http://groups.google.com/group/linux.kernel/msg/3a05d819c66474db
http://groups.google.com/group/linux.kernel/msg/391aebbb3dfd6039

And here is a link to the complete thread:
http://lkml.org/lkml/fancy/2006/11/3/124

I have a motherboard with three on-board 82559 NICs.

  o eepro100.ko properly initializes all three NICs
  o e100.ko fails to initialize one of them

NOTE: With kernel 2.6.14, e100.ko fails to initialize the NIC with MAC
address 00:30:64:04:E6:E4. With kernel 2.6.18 e100.ko fails to
initialize the NIC with MAC address 00:30:64:04:E6:E5.

The problem is not an incorrect checksum. (Donald Becker's dump utility
reports a correct checksum for all three NICs.) The problem seems to be
that e100.ko fails to read the contents of one of the EEPROMs.


snip

Thanks for the report, I have some thoughts.
I suspect that one reason beckers code works is that it uses IO based
access (slower, and different method) to the adapter rather than
memory mapped access.

The second thought is that the adapter is in D3, and something about
your kernel or the driver doesn't successfully wake it up to D0.  An
indication of this would be looking at lspci -vv before/after loading
the driver.  Also, after loading/unloading eepro100 does the e100
driver work?

A third idea is look for a master abort in lspci after e100 fails to load.

And a last idea is for us to instrument the reads /writes from/to the
device during init and see if everything is returning 0x, as
that indicates the I/O and/or memory bar is not enabled, or the
address returned from ioremap is invalid.

Jesse
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html