Re: Intel 82559 NIC corrupted EEPROM
Jesse Brandeburg wrote: John wrote: Jesse Brandeburg wrote: can you try adding mdelay(100); in e100_eeprom_load before the for loop, and then change the multiple udelay(4) to mdelay(1) in e100_eeprom_read I applied the attached patch. Loading the driver now takes around one minute :-) ouch, but yep, thats what happens when you use super extra delay I ran 'source load_unload' 25 times in a loop. The first 12 times were successful. The last 13 times failed. (cf. attached archive) I noticed something very strange. The number of words obviously in error (0x) returned by the EEPROM on 00:09.0 is not constant. That is very strange, I would think that maybe you have something else on the bus with the e100 that may be hogging bus cycles you have failing hardware (maybe a bad eeprom, or possibly a bad mac chip) $ grep -c 0x insmod* insmod_300.txt:0 insmod_301.txt:0 insmod_302.txt:0 insmod_303.txt:0 insmod_304.txt:0 insmod_305.txt:0 insmod_306.txt:0 insmod_307.txt:0 insmod_308.txt:0 insmod_309.txt:0 insmod_310.txt:0 insmod_311.txt:0 insmod_312.txt:1 insmod_313.txt:5 insmod_314.txt:24 insmod_315.txt:45 insmod_316.txt:243 insmod_317.txt:256 insmod_318.txt:256 insmod_319.txt:256 insmod_320.txt:256 insmod_321.txt:256 insmod_322.txt:256 insmod_323.txt:253 insmod_324.txt:240 this is even stranger, does it cycle back down (sine wave) to zero again? The delays did seem to work, at least sometimes. This indicates that something needs that extra delay to successfully read the eeprom. I might try changing all the udelay(4) to udelay(40) (x10 increase) and see if that gives you a happy medium of most times driver loads without error John, this problem seems to be very specific to your hardware. I know that you have put in a lot of time debugging this, but I'm not sure what we can do from here. If this were a generic code problem more people would be reporting the issue. What would you like to do? At this stage I would like e100 to work better than it is, but I'm not sure what to do next. Hello everyone, I'm resurrecting this thread because it appears we'll need to support these motherboards for several months to come, yet Adrian Bunk has scheduled the removal of eepro100 in January 2007. To recap, we have to support ~30 EBC-2000T motherboards. http://www.adlinktech.com/PD/web/PD_detail.php?pid=213 These motherboards come with three on-board Intel 82559 NICs. Last time I checked, i.e. two months ago, e100 did not correctly initialize all three NICs on these motherboards. Therefore, we've been using eepro100. I will be testing the latest 2.6.20 kernel to see if the situation has changed, but I wanted to let you all know that there are still some eepro100 users out there, out of necessity. Regards, John - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Intel 82559 NIC corrupted EEPROM
On 12/1/06, John [EMAIL PROTECTED] wrote: can you try adding mdelay(100); in e100_eeprom_load before the for loop, and then change the multiple udelay(4) to mdelay(1) in e100_eeprom_read I applied the attached patch. Loading the driver now takes around one minute :-) ouch, but yep, thats what happens when you use super extra delay I ran 'source load_unload' 25 times in a loop. The first 12 times were successful. The last 13 times failed. (cf. attached archive) I noticed something very strange. The number of words obviously in error (0x) returned by the EEPROM on 00:09.0 is not constant. That is very strange, I would think that maybe you have something else on the bus with the e100 that may be hogging bus cycles you have failing hardware (maybe a bad eeprom, or possibly a bad mac chip) $ grep -c 0x insmod* insmod_300.txt:0 insmod_301.txt:0 insmod_302.txt:0 insmod_303.txt:0 insmod_304.txt:0 insmod_305.txt:0 insmod_306.txt:0 insmod_307.txt:0 insmod_308.txt:0 insmod_309.txt:0 insmod_310.txt:0 insmod_311.txt:0 insmod_312.txt:1 insmod_313.txt:5 insmod_314.txt:24 insmod_315.txt:45 insmod_316.txt:243 insmod_317.txt:256 insmod_318.txt:256 insmod_319.txt:256 insmod_320.txt:256 insmod_321.txt:256 insmod_322.txt:256 insmod_323.txt:253 insmod_324.txt:240 this is even stranger, does it cycle back down (sine wave) to zero again? The delays did seem to work, at least sometimes. This indicates that something needs that extra delay to successfully read the eeprom. I might try changing all the udelay(4) to udelay(40) (x10 increase) and see if that gives you a happy medium of most times driver loads without error John, this problem seems to be very specific to your hardware. I know that you have put in a lot of time debugging this, but I'm not sure what we can do from here. If this were a generic code problem more people would be reporting the issue. What would you like to do? At this stage I would like e100 to work better than it is, but I'm not sure what to do next. Thanks for your patience on this issue, Jesse - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Intel 82559 NIC corrupted EEPROM
Jesse Brandeburg wrote: John wrote: Here is some context for those who have been added to the CC list: http://groups.google.com/group/linux.kernel/browse_frm/thread/bdc8fd08fb601c26 As far as I understand, some consider the eepro100 driver to be obsolete, and it has been considered for removal. What is the current status? Unfortunately, e100 does not work out-of-the-box on this system. Is there something I can do to improve the situation? Let's go ahead and print the output from e100_load_eeprom debug patch attached. Loading (then unloading) e100.ko fails the first few times (i.e. the driver claims one of the EEPROMs is corrupted). Thereafter, sometimes it fails, other times it works. Sounds like a race, no? $ cat load_unload : /var/log/kern.log insmod e100.ko debug=16 sleep 1 cp /var/log/kern.log insmod_$I.txt ip link ip_link_$I.txt sleep 2 rmmod e100 let I=I+1 (cf. attached compressed archive) FAILURE: insmod_100.txt insmod_101.txt insmod_102.txt insmod_105.txt insmod_107.txt insmod_108.txt insmod_110.txt insmod_111.txt insmod_114.txt SUCCESS: insmod_103.txt insmod_104.txt insmod_106.txt insmod_109.txt insmod_112.txt insmod_113.txt insmod_115.txt insmod_116.txt On an unrelated note, insmod_100.txt is truncated at the beginning, and insmod_110.txt is truncated in the middle (!!) cf. line 14. What would cause klogd to behave like that? Regards. TEST-e100.tar.bz2 Description: Binary data
Re: Intel 82559 NIC corrupted EEPROM
Jesse Brandeburg wrote: Can you send output of cat /proc/iomem -0009 : System RAM 000a-000b : Video RAM area 000f-000f : System ROM 0010-0ffe : System RAM 0010-00296a1a : Kernel code 00296a1b-0031bbe7 : Kernel data 0fff-0fff2fff : ACPI Non-volatile Storage 0fff3000-0fff : ACPI Tables 2000-200f : :00:08.0 2010-201f : :00:09.0 2020-202f : :00:0a.0 e000-e3ff : :00:00.0 e500-e50f : :00:08.0 e510-e51f : :00:09.0 e520-e52f : :00:0a.0 e530-e5300fff : :00:08.0 e5301000-e5301fff : :00:0a.0 e5302000-e5302fff : :00:09.0 - : reserved I've also attached: o config-2.6.18.1-adlink used to compile this kernel o dmesg output after the machine boots try something like the attached patch Loading e100-debug.ko reports: e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI e100: Copyright(c) 1999-2005 Intel Corporation ***e100 debug: unable to set power state (error 0) ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12 PCI: setting IRQ 12 as level-triggered ACPI: PCI Interrupt :00:08.0[A] - Link [LNKA] - GSI 12 (level, low) - IRQ 12 ***e100 debug: read 0100/ from the same register e100: eth0: e100_probe: addr 0xe530, irq 12, MAC addr 00:30:64:04:E6:E4 ***e100 debug: unable to set power state (error 0) ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 PCI: setting IRQ 10 as level-triggered ACPI: PCI Interrupt :00:09.0[A] - Link [LNKB] - GSI 10 (level, low) - IRQ 10 ***e100 debug: read 0100/ from the same register e100: :00:09.0: e100_eeprom_load: EEPROM corrupted ACPI: PCI interrupt for device :00:09.0 disabled e100: probe of :00:09.0 failed with error -11 ***e100 debug: unable to set power state (error 0) ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 PCI: setting IRQ 11 as level-triggered ACPI: PCI Interrupt :00:0a.0[A] - Link [LNKC] - GSI 11 (level, low) - IRQ 11 ***e100 debug: read 0100/ from the same register e100: eth1: e100_probe: addr 0xe5301000, irq 11, MAC addr 00:30:64:04:E6:E6 In other words, the behavior is the same for all three NICs. pci_set_power_state(pdev, PCI_D0) returns 0 pci_iomap returns something != NULL Can I provide more information to help locate the problem? # # Automatically generated make config: don't edit # Linux kernel version: 2.6.18.1-hrt # Tue Nov 7 17:52:26 2006 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_POSIX_MQUEUE is not set # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_RELAY is not set CONFIG_INITRAMFS_SOURCE= # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set # CONFIG_KMOD is not set # # Block layer # # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y # CONFIG_IOSCHED_AS is not set # CONFIG_IOSCHED_DEADLINE is not set CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=cfq # # Processor type and features # # CONFIG_HIGH_RES_TIMERS is not set # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set CONFIG_MPENTIUMIII=y # CONFIG_MPENTIUMM is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not
Re: Intel 82559 NIC corrupted EEPROM
Auke Kok wrote: This is what I was afraid of: even though the code allows you to bypass the EEPROM checksum, the probe fails on a further check to see if the MAC address is valid. Since something with this NIC specifically made the EEPROM return all 0xff's, the MAC address is automatically invalid, and thus probe fails. I don't understand why you think there is something wrong with a specific NIC? In 2.6.14.7, e100.ko fails to read the EEPROM on :00:08.0 (eth0) In 2.6.18.1, e100.ko fails to read the EEPROM on :00:09.0 (eth1) In both kernels, eepro100.ko successfully reads all the EEPROMs. It seems that the driver has more problems with this NIC than just the eeprom checksum being bad. Needless to say this might need fixing. Can you load the eepro driver and send me the full eeprom dump? Perhaps I can duplicate things over here. 00:08.0 EEPROM contents, size 64x16 3000 0464 e4e6 0e03 0201 4701 7213 8310 40a2 0001 8086 0128 92f7 00:09.0 EEPROM contents, size 64x16 3000 0464 e5e6 0e03 0201 4701 7213 8310 40a2 0001 8086 0128 91f7 00:0a.0 EEPROM contents, size 64x16 3000 0464 e6e6 0e03 0201 4701 7213 8310 40a2 0001 8086 0128 90f7 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Intel 82559 NIC corrupted EEPROM
Jesse Brandeburg wrote: I suspect that one reason Becker's code works is that it uses IO based access (slower, and different method) to the adapter rather than memory mapped access. I've noticed this difference. The second thought is that the adapter is in D3, and something about your kernel or the driver doesn't successfully wake it up to D0. On my NICs, the EEPROM ID (Word 0Ah) is set to 0x40a2. Thus DDPD (bit 6) is set to 0. DDPD is the Disable Deep Power Down while PME is disabled bit. 0 - Deep Power Down is enabled in D3 state while PME-disabled. 1 - Deep Power Down disabled in D3 state while PME-disabled. This bit should be set to 1b if a TCO controller is being used via the SMB because it requires receive functionality at all power states. Are you suggesting I try and set DDPD to 1? Or is this completely unrelated? An indication of this would be looking at lspci -vv before/after loading the driver. $ diff -u lspci_vv_before_e100.txt lspci_vv_after_e100.txt --- lspci_vv_before_e100.txt2006-11-09 14:51:30.0 +0100 +++ lspci_vv_after_e100.txt 2006-11-09 14:51:30.0 +0100 @@ -74,21 +74,20 @@ Expansion ROM at 2000 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) - Status: D0 PME-Enable+ DSel=0 DScale=2 PME- + Status: D0 PME-Enable- DSel=0 DScale=2 PME- 00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corporation EtherExpress PRO/100B (TX) - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- + Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- - Latency: 32 (2000ns min, 14000ns max), cache line size 08 Interrupt: pin A routed to IRQ 10 - Region 0: Memory at e5302000 (32-bit, non-prefetchable) [size=4K] - Region 1: I/O ports at dc00 [size=64] - Region 2: Memory at e510 (32-bit, non-prefetchable) [size=1M] + Region 0: Memory at e5302000 (32-bit, non-prefetchable) [disabled] [size=4K] + Region 1: I/O ports at dc00 [disabled] [size=64] + Region 2: Memory at e510 (32-bit, non-prefetchable) [disabled] [size=1M] Expansion ROM at 2010 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) - Status: D0 PME-Enable+ DSel=0 DScale=2 PME- + Status: D0 PME-Enable- DSel=0 DScale=2 PME- 00:0a.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corporation EtherExpress PRO/100B (TX) Also, after loading/unloading eepro100 does the e100 driver work? No. A third idea is look for a master abort in lspci after e100 fails to load. I don't understand that one. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Intel 82559 NIC corrupted EEPROM
John wrote: Auke Kok wrote: This is what I was afraid of: even though the code allows you to bypass the EEPROM checksum, the probe fails on a further check to see if the MAC address is valid. Since something with this NIC specifically made the EEPROM return all 0xff's, the MAC address is automatically invalid, and thus probe fails. I don't understand why you think there is something wrong with a specific NIC? that was completely not my point - I was merely trying to point out that the original problem causes a cascade of error events later on, and bypassing the eeprom check in this case didn't help you at all. Something is wrong in the driver, but I don't understand yet why it only affects one of the 3 nics in your system. In 2.6.14.7, e100.ko fails to read the EEPROM on :00:08.0 (eth0) In 2.6.18.1, e100.ko fails to read the EEPROM on :00:09.0 (eth1) almost sounds like a bug got fixed and it introduced a regression. this wouldn't be the right time to pull out git-bisect would it? even loading 2.6.15, 2.6.16, 2.6.17 on it would give us some good information. Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Intel 82559 NIC corrupted EEPROM
On 11/9/06, John [EMAIL PROTECTED] wrote: The second thought is that the adapter is in D3, and something about your kernel or the driver doesn't successfully wake it up to D0. On my NICs, the EEPROM ID (Word 0Ah) is set to 0x40a2. Thus DDPD (bit 6) is set to 0. DDPD is the Disable Deep Power Down while PME is disabled bit. 0 - Deep Power Down is enabled in D3 state while PME-disabled. 1 - Deep Power Down disabled in D3 state while PME-disabled. This bit should be set to 1b if a TCO controller is being used via the SMB because it requires receive functionality at all power states. Are you suggesting I try and set DDPD to 1? Or is this completely unrelated? This may be related but I doubt it. Something is strange about how memory is being mapped in your system. whatever is creating the problem moved when you changed the kernel version. I'm wondering if there is a device collision at e5302000. I'm not convinced at this point it is e100's fault. can you send output of cat /proc/iomem An indication of this would be looking at lspci -vv before/after loading the driver. $ diff -u lspci_vv_before_e100.txt lspci_vv_after_e100.txt --- lspci_vv_before_e100.txt2006-11-09 14:51:30.0 +0100 +++ lspci_vv_after_e100.txt 2006-11-09 14:51:30.0 +0100 @@ -74,21 +74,20 @@ Expansion ROM at 2000 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) - Status: D0 PME-Enable+ DSel=0 DScale=2 PME- + Status: D0 PME-Enable- DSel=0 DScale=2 PME- okay when the driver loads it is clearing PME enable, but not re-enabling it when it unloads. That is pretty much expected. 00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corporation EtherExpress PRO/100B (TX) - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- + Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- pci_enable_device should be enabling io,mem,busmaster, they are probably being disabled when the driver errors out of init. maybe you should add a call to pci_set_power_state(dev, PCI_D0); before the call to e100_reset Also, after loading/unloading eepro100 does the e100 driver work? No. now that is really odd. A third idea is look for a master abort in lspci after e100 fails to load. I don't understand that one. There isn't one, MAbort+ would be showing in the above lspci output. The all 0x returns when you read registers is a sure sign the hardware either isn't at the address specified or is in a power down state. The only other option i can think of is that something else is intercepting memory reads and writes. try something like the attached patch, compile tested only: e100_debug.patch Description: Binary data
Re: Intel 82559 NIC corrupted EEPROM
Hello all, [ E-mail address is a bit-bucket. I *do* monitor the mailing lists. ] I will try and summarize the problem as I understand it at this point. I've written two messages so far: http://groups.google.com/group/linux.kernel/msg/3a05d819c66474db http://groups.google.com/group/linux.kernel/msg/391aebbb3dfd6039 And here is a link to the complete thread: http://lkml.org/lkml/fancy/2006/11/3/124 I have a motherboard with three on-board 82559 NICs. o eepro100.ko properly initializes all three NICs o e100.ko fails to initialize one of them NOTE: With kernel 2.6.14, e100.ko fails to initialize the NIC with MAC address 00:30:64:04:E6:E4. With kernel 2.6.18 e100.ko fails to initialize the NIC with MAC address 00:30:64:04:E6:E5. The problem is not an incorrect checksum. (Donald Becker's dump utility reports a correct checksum for all three NICs.) The problem seems to be that e100.ko fails to read the contents of one of the EEPROMs. Auke wrote: How did you do the first `ethtool` eeprom dump? did you have the `e100` module loaded at that time? Did you use the new `override` mechanism graciously donated by David M? These tests were performed on a 2.6.14 kernel. I hacked e100_eeprom_load() to return 0 even when the checksum fails. Thus the driver did not refuse to load, and I was able to use ethtool to dump the contents of the 3 EEPROMs. Here are additional examples running a 2.6.18.1-hrt kernel. 'insmod e100.ko' reports: e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI e100: Copyright(c) 1999-2005 Intel Corporation ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12 PCI: setting IRQ 12 as level-triggered ACPI: PCI Interrupt :00:08.0[A] - Link [LNKA] - GSI 12 (level, low) - IRQ 12 e100: eth0: e100_probe: addr 0xe530, irq 12, MAC addr 00:30:64:04:E6:E4 ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 PCI: setting IRQ 10 as level-triggered ACPI: PCI Interrupt :00:09.0[A] - Link [LNKB] - GSI 10 (level, low) - IRQ 10 e100: :00:09.0: e100_eeprom_load: EEPROM corrupted ACPI: PCI interrupt for device :00:09.0 disabled e100: probe of :00:09.0 failed with error -11 ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 PCI: setting IRQ 11 as level-triggered ACPI: PCI Interrupt :00:0a.0[A] - Link [LNKC] - GSI 11 (level, low) - IRQ 11 e100: eth1: e100_probe: addr 0xe5301000, irq 11, MAC addr 00:30:64:04:E6:E6 'insmod e100.ko eeprom_bad_csum_allow=1' reports: e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI e100: Copyright(c) 1999-2005 Intel Corporation ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12 PCI: setting IRQ 12 as level-triggered ACPI: PCI Interrupt :00:08.0[A] - Link [LNKA] - GSI 12 (level, low) - IRQ 12 e100: eth0: e100_probe: addr 0xe530, irq 12, MAC addr 00:30:64:04:E6:E4 ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 PCI: setting IRQ 10 as level-triggered ACPI: PCI Interrupt :00:09.0[A] - Link [LNKB] - GSI 10 (level, low) - IRQ 10 e100: :00:09.0: e100_eeprom_load: EEPROM corrupted e100: :00:09.0: e100_probe: Invalid MAC address from EEPROM, aborting. ACPI: PCI interrupt for device :00:09.0 disabled e100: probe of :00:09.0 failed with error -11 ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 PCI: setting IRQ 11 as level-triggered ACPI: PCI Interrupt :00:0a.0[A] - Link [LNKC] - GSI 11 (level, low) - IRQ 11 e100: eth1: e100_probe: addr 0xe5301000, irq 11, MAC addr 00:30:64:04:E6:E6 'insmod e100.ko debug=16' reports: e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI e100: Copyright(c) 1999-2005 Intel Corporation ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12 PCI: setting IRQ 12 as level-triggered ACPI: PCI Interrupt :00:08.0[A] - Link [LNKA] - GSI 12 (level, low) - IRQ 12 e100: :00:08.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x, data_out=0x18203000 e100: :00:08.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x, data_out=0x18217809 e100: :00:08.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x, data_out=0x18217809 e100: :00:08.0: e100_phy_init: phy_addr = 1 e100: :00:08.0: mdio_ctrl: WRITE:addr=0, reg=0, data_in=0x0400, data_out=0x14000400 e100: :00:08.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x, data_out=0x18203000 e100: :00:08.0: mdio_ctrl: WRITE:addr=1, reg=0, data_in=0x3000, data_out=0x14203000 e100: :00:08.0: mdio_ctrl: WRITE:addr=2, reg=0, data_in=0x0400, data_out=0x14400400 e100: :00:08.0: mdio_ctrl: WRITE:addr=3, reg=0, data_in=0x0400, data_out=0x14600400 e100: :00:08.0: mdio_ctrl: WRITE:addr=4, reg=0, data_in=0x0400, data_out=0x14800400 e100: :00:08.0: mdio_ctrl: WRITE:addr=5, reg=0, data_in=0x0400, data_out=0x14A00400 e100: :00:08.0: mdio_ctrl: WRITE:addr=6, reg=0, data_in=0x0400, data_out=0x14C00400 e100: :00:08.0: mdio_ctrl: WRITE:addr=7, reg=0, data_in=0x0400, data_out=0x14E00400 e100: :00:08.0: mdio_ctrl: WRITE:addr=8, reg=0, data_in=0x0400, data_out=0x15000400 e100: :00:08.0: mdio_ctrl:
Re: Intel 82559 NIC corrupted EEPROM
John wrote: I have a motherboard with three on-board 82559 NICs. o eepro100.ko properly initializes all three NICs o e100.ko fails to initialize one of them NOTE: With kernel 2.6.14, e100.ko fails to initialize the NIC with MAC address 00:30:64:04:E6:E4. With kernel 2.6.18 e100.ko fails to initialize the NIC with MAC address 00:30:64:04:E6:E5. The problem is not an incorrect checksum. (Donald Becker's dump utility reports a correct checksum for all three NICs.) The problem seems to be that e100.ko fails to read the contents of one of the EEPROMs. [snip] 'insmod e100.ko eeprom_bad_csum_allow=1' reports: e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI e100: Copyright(c) 1999-2005 Intel Corporation ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12 PCI: setting IRQ 12 as level-triggered ACPI: PCI Interrupt :00:08.0[A] - Link [LNKA] - GSI 12 (level, low) - IRQ 12 e100: eth0: e100_probe: addr 0xe530, irq 12, MAC addr 00:30:64:04:E6:E4 ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 PCI: setting IRQ 10 as level-triggered ACPI: PCI Interrupt :00:09.0[A] - Link [LNKB] - GSI 10 (level, low) - IRQ 10 e100: :00:09.0: e100_eeprom_load: EEPROM corrupted e100: :00:09.0: e100_probe: Invalid MAC address from EEPROM, aborting. ACPI: PCI interrupt for device :00:09.0 disabled e100: probe of :00:09.0 failed with error -11 This is what I was afraid of: even though the code allows you to bypass the EEPROM checksum, the probe fails on a further check to see if the MAC address is valid. Since something with this NIC specifically made the EEPROM return all 0xff's, the MAC address is automatically invalid, and thus probe fails. It seems that the driver has more problems with this NIC than just the eeprom checksum being bad. Needless to say this might need fixing. Can you load the eepro driver and send me the full eeprom dump? Perhaps I can duplicate things over here. [snip] On a related note, I am concerned by this message: Sleep mode is enabled. This is not recommended. Under high load the card may not respond to PCI requests, and thus cause a master abort. To clear sleep mode use the '-G 0 -w -w -f' options. Intel 82559 EEPROM Map and Programming Information (AP-394) states: http://www.intel.com/design/network/applnots/ap394.htm The Standby Enable bit enables the 82559 to enter standby mode. When this bit equals 1b, the 82559 is able to recognize an idle state and can enter standby mode (some internal clocks are stopped for power saving purposes). The 82559 does not require a PCI clock signal in standby mode. If this bit equals 0b, the idle recognition circuit is disabled and the 82559 always remains in an active state. Thus, the 82559 always requests PCI CLK using the Clockrun mechanism. Auke, do you agree with Donald Becker's warning? If you are using the e100 in a performance situation, I would certainly switch it off :) If I disable STB, the NICs will waste a bit more power when idle, is that correct? Are there other implications? hm, I don't know the power specs of e100 that well, so I can't say that it saves significant amounts of power, but I suspect it would. Power management on nics is hairy business. As suggested, it can take time before the nic powers back up, performance can be impacted, and some commands might return an invalid or unknown value. OTOH our labs here test these things pretty well before they get send out to customers and resales agents, so Beckers cautious wording catches the severity pretty well (recommended). I would say that under most circumstances, it's safe to enable STB, but you might want to disable it for use in routing and other server applications, where most of the time the NIC is active anyway. hth Auke Thanks for reading this far! John - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Intel 82559 NIC corrupted EEPROM
On 11/8/06, John [EMAIL PROTECTED] wrote: Hello all, [ E-mail address is a bit-bucket. I *do* monitor the mailing lists. ] I will try and summarize the problem as I understand it at this point. I've written two messages so far: http://groups.google.com/group/linux.kernel/msg/3a05d819c66474db http://groups.google.com/group/linux.kernel/msg/391aebbb3dfd6039 And here is a link to the complete thread: http://lkml.org/lkml/fancy/2006/11/3/124 I have a motherboard with three on-board 82559 NICs. o eepro100.ko properly initializes all three NICs o e100.ko fails to initialize one of them NOTE: With kernel 2.6.14, e100.ko fails to initialize the NIC with MAC address 00:30:64:04:E6:E4. With kernel 2.6.18 e100.ko fails to initialize the NIC with MAC address 00:30:64:04:E6:E5. The problem is not an incorrect checksum. (Donald Becker's dump utility reports a correct checksum for all three NICs.) The problem seems to be that e100.ko fails to read the contents of one of the EEPROMs. snip Thanks for the report, I have some thoughts. I suspect that one reason beckers code works is that it uses IO based access (slower, and different method) to the adapter rather than memory mapped access. The second thought is that the adapter is in D3, and something about your kernel or the driver doesn't successfully wake it up to D0. An indication of this would be looking at lspci -vv before/after loading the driver. Also, after loading/unloading eepro100 does the e100 driver work? A third idea is look for a master abort in lspci after e100 fails to load. And a last idea is for us to instrument the reads /writes from/to the device during init and see if everything is returning 0x, as that indicates the I/O and/or memory bar is not enabled, or the address returned from ioremap is invalid. Jesse - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html