Jesse,

Thanks for the feedback.  This is intermittent and the cases where it
occurs tend to be deployed systems, so instrumenting the driver may be
difficult.  

One other clue is that it only occurs on the onboard 80003ES2LAN NICs,
but there are other PCI GbE NICs in the systems where this never happens
(also using e1000e).  
Does the e1000 driver also have a link check thread?  I'm wondering why
both e1000e and e1000 drivers would show this.  Could it have something
to do with the fact that the BMC also uses this NIC?  

I guess that leads me to these questions/approaches:
1) Using a netlink socket to monitor link state instead.  Is this what
ethtool does?
2) If it is specific to the 80003ES2LAN, could it be a NIC firmware
issue (1.0)?  I didn't see a later version on intel.com.  

Andy

-----Original Message-----
From: Brandeburg, Jesse [mailto:[email protected]] 
Sent: Tuesday, October 26, 2010 1:56 PM
To: Andy Cress
Cc: [email protected]
Subject: Re: [E1000-devel] 80003ES2LAN PHY linkup bit not set
intermittently with e1000e



On Mon, 25 Oct 2010, Andy Cress wrote:

> Configuration:
> Motherboard S5000PAL
> NIC:  80003ES2LAN onboard NIC (dual)
> 
> # ethtool eth0
> Settings for eth0:
>         Supported ports: [ TP ]
>         Supported link modes: 10baseT/Half 10baseT/Full
>                                 100baseT/Half 100baseT/Full
>                                 1000baseT/Full
>         Supports auto-negotiation: Yes
>         Advertised link modes: 10baseT/Half 10baseT/Full
>                                 100baseT/Half 100baseT/Full
>                                 1000baseT/Full
>         Advertised auto-negotiation: Yes
>         Speed: 100Mb/s
>         Duplex: Full
>         Port: Twisted Pair
>         PHYAD: 0
>         Transceiver: internal
>         Auto-negotiation: on
>         Supports Wake-on: umbg
>         Wake-on: g
>         Current message level: 0x00000007 (7)
>         Link detected: yes
> 
> # ethtool -i eth0
> driver: e1000e
> version: 1.2.17-NAPI
> firmware-version: 1.0-0
> bus-info: 0000:07:00.1
> 
> BTW, It also had the same symptom with the previously loaded driver:
> driver: e1000
> version: 7.3.20-jumbo-NAPI
> firmware-version: 1.0-0
> bus-info: 0000:07:00.1
> 
> Symptom:  
> A daemon that uses ioctl PHY to check whether the link is up or not
> reports an error intermittently across several similar servers.  It
> recovers quickly and reports link up again.  There are no
corresponding
> syslog messages from the driver in /var/log/messages.
> 
> I'm not sure if the problem is in the driver, the firmware, the
network,
> or in the daemon.  What do you think it could be?

my guess is the driver is having difficulty reading the PHY register,
see 
below...

> This is the message that the daemon returns:
>    netmon: EthDriverQuery.cc(173): LINK_BAD:ioctl on eth0 returned
4416
> 
> This is the code fragment from the daemon that is monitoring the link
> state for these NICs.
> LinkStateType EthDriverQuery::checkLinkState() {
>    if (sock == -1)
>    {
>       design_log(SUBSYS_NAME, __FILE__, __LINE__,"Bad Socket
returned");
>       return ERROR;
>    }
>    if (ioctl(sock, SIOCGMIIREG, &ifr) < 0)
>    {
>         design_log(SUBSYS_NAME, __FILE__, __LINE__,
>                      "link state query on %s failed: %s\n",
>                      ifr.ifr_name, strerror(errno));
>         design_log(SUBSYS_NAME, __FILE__, __LINE__,"LINK_BAD:ioctl on
%s
> returned %u \n",ifr.ifr_name, data[3]);
>       return ERROR;
>    }
> 
>    //the link state register is returned in data[3]
>    if ((data[3] & 0x0016) == 0x4)
>       return LINK_GOOD;
>    else
>    {
>         design_log(SUBSYS_NAME, __FILE__, __LINE__,"LINK_BAD:ioctl on
%s
> returned %u \n",ifr.ifr_name, data[3]);
>         return LINK_BAD;
>    }
> 
> So the code is looking for the MII REG, in particular, the link state
> register in the data[3].  It expects the linkup bit to be set, but the
> jabber and remote failure to not be set,  and when the error occurs
all
> three are set.  The ioctl call itself succeeds, just 
> the data is strange.

Hi Andy, interesting problem.  Is there some reason to query the phy 
directly instead of using a netlink socket to monitor the link state?

I suspect that the times when it fails we likely didn't get the
semaphore 
or something in the driver prevented the PHY read from completing (like
a 
race of some kind with another driver thread (maybe link check)

My guess is that the error propogation from the driver's failed PHY read

is not correct.  Could you instrument the driver's reads and return
codes 
from the calls it makes?

Jesse

------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to