Re: [E1000-devel] e1000e: NVM corrupted (kernel 2.6.16.y)

2009-04-24 Thread Holger Eitzenberger
Hi Jesse,

  Other than that I'm fine evaluating that patch in our testlab.
 
 any news on the evaluation?

after checking the driver I think it's best to continously do a
offline selftest, as the driver seems to use the SWSM/SWSM2 registers
somewhere below there.  The first interface of the dual port adapter
is UP and continously send traffic through, the other interface is
DOWN and being tested.

All is fine up until now.  If you are fine with that test setup I'll
keep it running up until monday.

 /holger


--
Crystal Reports #45; New Free Runtime and 30 Day Trial
Check out the new simplified licensign option that enables unlimited
royalty#45;free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] e1000e: NVM corrupted (kernel 2.6.16.y)

2009-04-24 Thread Brandeburg, Jesse
On Fri, 24 Apr 2009, Holger Eitzenberger wrote:
   Other than that I'm fine evaluating that patch in our testlab.
  
  any news on the evaluation?
 
 after checking the driver I think it's best to continously do a
 offline selftest, as the driver seems to use the SWSM/SWSM2 registers
 somewhere below there.  The first interface of the dual port adapter
 is UP and continously send traffic through, the other interface is
 DOWN and being tested.
 
 All is fine up until now.  If you are fine with that test setup I'll
 keep it running up until monday.

How does this test relate to the original report of the NVM corruption?  
Was that the kind of test you were running on the interfaces that had 
reported corruption?

Otherwise just by itself the test sounds fine.

Jesse

--
Crystal Reports #45; New Free Runtime and 30 Day Trial
Check out the new simplified licensign option that enables unlimited
royalty#45;free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] e1000e: NVM corrupted (kernel 2.6.16.y)

2009-04-23 Thread Holger Eitzenberger

 The bit doesn't quite work the same as the original SWSM lock bit.  And we 
 are only trying to solve the problem of has a driver loaded on either 
 port yet?  and since probe is not parallelizable, we are guaranteed not 
 to have a race here, or be preempted (to the point another probe could 
 run)

Thanks, that explains a lot!

  Other than that I'm fine evaluating that patch in our testlab.
 
 any news on the evaluation?

I think I can do that tomorrow.  The patch at least did apply fine
against my 2.6.29 test kernel.  I'll do some testing with ethtool
then.

 /holger


--
Crystal Reports #45; New Free Runtime and 30 Day Trial
Check out the new simplified licensign option that enables unlimited
royalty#45;free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] e1000e: NVM corrupted (kernel 2.6.16.y)

2009-04-21 Thread Holger Eitzenberger

   no, doesn't apply to this hardware, as it is not ICH (integrated LOM) it 
   is a standalone 82571 with an actual discreet eeprom chip per port.

 holger, you're welcome to try this patch, it was made against net-2.6 from 
 a couple of weeks ago

Thanks Jesse,

I have a few questions after looking at your patch:

* the older all-in-one e1000 driver does not use SWSM2 on the dual-port
  adapters.  Does it mean it's affected as well?  I ask because of the
  general necessity for me to justify the driver update after all.

* I was unable to locate SWSM2 in the documenation of 82571EB.
  However, the usage seems to be similar to SWSM.  Refering to this
  snippet here:

swsm2 = er32(SWSM2);

if (!(swsm2  E1000_SWSM2_LOCK)) {
/* Only do this for the first interface on this card */
ew32(SWSM2, swsm2 | E1000_SWSM2_LOCK);

  I see a general race condition, because the patch doesn't check SWSM2
  after writing it and there is nothing I see which prevents
  a preemption after reading SWSM2 the first time.  Therefore from the
  documentation something like

ew32(SWSM2, swsm2 | E1000_SWSM2_LOCK);
swsm2 = er32(SWSM2);
if (swsm2  E1000_SWSM2_LOCK) {
/* now you are sure you have the lock */
}

  should be more correct.

Please note however, that I do not have documentation about SWSM2
in particular.  If my above assumption about it's workings is not
correct, please just ignore the last issue.

Other than that I'm fine evaluating that patch in our testlab.

Many thanks! :)

  /holger


--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] e1000e: NVM corrupted (kernel 2.6.16.y)

2009-04-20 Thread Holger Eitzenberger

   e1000e: write protect ICHx NVM to prevent malicious write/erase
 
 no, doesn't apply to this hardware, as it is not ICH (integrated LOM) it 
 is a standalone 82571 with an actual discreet eeprom chip per port.
 
  to include our modules as well in order to find out who is overwriting
  memory?
 
 it is highly unlikely something is succeeding in writing to the eeprom, 
 however, we do know of some locking issues in the driver that we've been 
 resolving specifically for 82571 and that might somehow be related.

Do you refer to these two here, or something different?

  e1000e: do not ever sleep in interrupt context
  e1000e: reset swflag after resetting hardware

Regards.

 /holger


--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] e1000e: NVM corrupted (kernel 2.6.16.y)

2009-04-20 Thread Brandeburg, Jesse


On Mon, 20 Apr 2009, Holger Eitzenberger wrote:

 
e1000e: write protect ICHx NVM to prevent malicious write/erase
  
  no, doesn't apply to this hardware, as it is not ICH (integrated LOM) it 
  is a standalone 82571 with an actual discreet eeprom chip per port.
  
   to include our modules as well in order to find out who is overwriting
   memory?
  
  it is highly unlikely something is succeeding in writing to the eeprom, 
  however, we do know of some locking issues in the driver that we've been 
  resolving specifically for 82571 and that might somehow be related.
 
 Do you refer to these two here, or something different?
 
   e1000e: do not ever sleep in interrupt context
   e1000e: reset swflag after resetting hardware
 

there is a different patch, under internal test currently, we hope to 
release it soon in a new e1000e driver patch to the kernel once it has 
completed testing.

--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


[E1000-devel] e1000e: NVM corrupted (kernel 2.6.16.y)

2009-04-17 Thread Holger Eitzenberger
Kernel v2.6.16.y (+ patches)
e1000e v0.5.11.2


Hi,

I'm facing NVM corruption on both ports a dual port NIC module:

  PCI: Enabling device :09:00.0 ( - 0003)
  ACPI: PCI Interrupt :09:00.0[A] - GSI 19 (level, low) - IRQ 193
  PCI: Setting latency timer of device :09:00.0 to 64
  :09:00.0: :09:00.0: The NVM Checksum Is Not Valid
  ACPI: PCI interrupt for device :09:00.0 disabled
  e1000e: probe of :09:00.0 failed with error -5
  PCI: Enabling device :09:00.1 ( - 0003)
  ACPI: PCI Interrupt :09:00.1[B] - GSI 16 (level, low) - IRQ 169
  PCI: Setting latency timer of device :09:00.1 to 64
  :09:00.1: :09:00.1: The NVM Checksum Is Not Valid

Output of lspci is available here [1], here [2] and here [3].

There are three other identical modules in that box which do not face
the issue.  Reportedly the interfaces more or less worked before
the upgrade (before that version was the all-in-one e1000 driver
v7.6.15.5).  However, both these interfaces reportedly both failed
several times before the upgrade of the driver.

Wrt to http://lkml.org/lkml/2008/9/25/510 and the patches mentioned
therein I backported specifically

 e1000e: allow bad checksum

As expected, the interface does not work after that, but the output
is different:

  PCI: Enabling device :09:00.1 ( - 0003)
  ACPI: PCI Interrupt :09:00.1[B] - GSI 16 (level, low) - IRQ 169
  PCI: Setting latency timer of device :09:00.1 to 64
  :09:00.1: :09:00.1: The NVM Checksum Is Not Valid
  :09:00.1: :09:00.1: Invalid MAC Address: 00:00:00:00:00:00
  :09:00.1: eth7: (PCI Express:2.5GB/s:Width x4) f79b4118M
  :09:00.1: eth7: Intel(R) PRO/1000 Network Connection
  :09:00.1: eth7: MAC: 1, PHY: 1, PBA No: ff-0ff

As I'm only a bit familiar with the HW documetation available for
82571EB modules I need your help:

1. can I safely modify the commit 4a7703582836f55 (Linus tree)

 e1000e: write protect ICHx NVM to prevent malicious write/erase

to include our modules as well in order to find out who is overwriting
memory?

* i copied an apparently correct eeprom from another box (ethtool -e)
and tried to apply it (ethtool -E) on the broken box:

 # ethtool -E eth6  e1000e-eeprom-eth6 
 Cannot set EEPROM data: Invalid argument

(I specifically made sure that the above mentioned patch to
write-protect the NVRAM was disabled).  Maybe I'm just stupid, but
what is wrong here?

Any help welcome.

Regards.

 /holger

[1] http://people.astaro.com/heitzenberger/e1000e/lspci_tv
[2] http://people.astaro.com/heitzenberger/e1000e/lspci_vvx
[3] http://people.astaro.com/heitzenberger/e1000e/lspci_vvxn


--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] e1000e: NVM corrupted (kernel 2.6.16.y)

2009-04-17 Thread Brandeburg, Jesse
On Fri, 17 Apr 2009, Holger Eitzenberger wrote:

 Kernel v2.6.16.y (+ patches)
 e1000e v0.5.11.2
 
 
 Hi,

Hi again Holger,

 I'm facing NVM corruption on both ports a dual port NIC module:
 
   PCI: Enabling device :09:00.0 ( - 0003)
   ACPI: PCI Interrupt :09:00.0[A] - GSI 19 (level, low) - IRQ 193
   PCI: Setting latency timer of device :09:00.0 to 64
   :09:00.0: :09:00.0: The NVM Checksum Is Not Valid
   ACPI: PCI interrupt for device :09:00.0 disabled
   e1000e: probe of :09:00.0 failed with error -5
   PCI: Enabling device :09:00.1 ( - 0003)
   ACPI: PCI Interrupt :09:00.1[B] - GSI 16 (level, low) - IRQ 169
   PCI: Setting latency timer of device :09:00.1 to 64
   :09:00.1: :09:00.1: The NVM Checksum Is Not Valid
 
 Output of lspci is available here [1], here [2] and here [3].

The only link that works is the first, but I do see that you have 82571 
parts.
 
 There are three other identical modules in that box which do not face
 the issue.  Reportedly the interfaces more or less worked before
 the upgrade (before that version was the all-in-one e1000 driver
 v7.6.15.5).  However, both these interfaces reportedly both failed
 several times before the upgrade of the driver.
 
 Wrt to http://lkml.org/lkml/2008/9/25/510 and the patches mentioned
 therein I backported specifically
 
  e1000e: allow bad checksum
 
 As expected, the interface does not work after that, but the output
 is different:
 
   PCI: Enabling device :09:00.1 ( - 0003)
   ACPI: PCI Interrupt :09:00.1[B] - GSI 16 (level, low) - IRQ 169
   PCI: Setting latency timer of device :09:00.1 to 64
   :09:00.1: :09:00.1: The NVM Checksum Is Not Valid
   :09:00.1: :09:00.1: Invalid MAC Address: 00:00:00:00:00:00
   :09:00.1: eth7: (PCI Express:2.5GB/s:Width x4) f79b4118M
   :09:00.1: eth7: Intel(R) PRO/1000 Network Connection
   :09:00.1: eth7: MAC: 1, PHY: 1, PBA No: ff-0ff

great! okay, please send the output of ethtool -e for each bad interface 
(if you attach to the list as a .txt file it will be let through)
 
 As I'm only a bit familiar with the HW documetation available for
 82571EB modules I need your help:
 
 1. can I safely modify the commit 4a7703582836f55 (Linus tree)
 
  e1000e: write protect ICHx NVM to prevent malicious write/erase

no, doesn't apply to this hardware, as it is not ICH (integrated LOM) it 
is a standalone 82571 with an actual discreet eeprom chip per port.

 to include our modules as well in order to find out who is overwriting
 memory?

it is highly unlikely something is succeeding in writing to the eeprom, 
however, we do know of some locking issues in the driver that we've been 
resolving specifically for 82571 and that might somehow be related.
 
 * i copied an apparently correct eeprom from another box (ethtool -e)
 and tried to apply it (ethtool -E) on the broken box:
 
  # ethtool -E eth6  e1000e-eeprom-eth6 
  Cannot set EEPROM data: Invalid argument

unfortunately the eeprom cannot be written in a big hunk this way, the 
command will only write a byte at a time, the correct command (for each 
byte) looks something like this:

assuming your device id in lspci -n is 8086:1060
ethtool -E eth2 magic 0x10608086 offset 0x10 value 0xfe

so some script that can read datafile and put each byte at a time with the 
above command should be used.
 
 (I specifically made sure that the above mentioned patch to
 write-protect the NVRAM was disabled).  Maybe I'm just stupid, but
 what is wrong here?

there is no write protect (AFAIK) like you refer to when we're using 
eeprom, only NVM (like flash memory)

if you have access to premier.intel.com you already have NDA with us and 
can probably get a hold of our manufacturing tool eeupdate that will 
reprogram the eeprom for you.  If not you should talk to your local field 
agent.

do you have anything during your boot process that is using ethtool 
commands frequently on either interface on the MAC that is having problems 
(one eeprom is shared for each pair of ports)

--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel