On Mon, 25 Oct 2010, Andy Cress wrote:
> Configuration:
> Motherboard S5000PAL
> NIC: 80003ES2LAN onboard NIC (dual)
>
> # ethtool eth0
> Settings for eth0:
> Supported ports: [ TP ]
> Supported link modes: 10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
> 1000baseT/Full
> Supports auto-negotiation: Yes
> Advertised link modes: 10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
> 1000baseT/Full
> Advertised auto-negotiation: Yes
> Speed: 100Mb/s
> Duplex: Full
> Port: Twisted Pair
> PHYAD: 0
> Transceiver: internal
> Auto-negotiation: on
> Supports Wake-on: umbg
> Wake-on: g
> Current message level: 0x00000007 (7)
> Link detected: yes
>
> # ethtool -i eth0
> driver: e1000e
> version: 1.2.17-NAPI
> firmware-version: 1.0-0
> bus-info: 0000:07:00.1
>
> BTW, It also had the same symptom with the previously loaded driver:
> driver: e1000
> version: 7.3.20-jumbo-NAPI
> firmware-version: 1.0-0
> bus-info: 0000:07:00.1
>
> Symptom:
> A daemon that uses ioctl PHY to check whether the link is up or not
> reports an error intermittently across several similar servers. It
> recovers quickly and reports link up again. There are no corresponding
> syslog messages from the driver in /var/log/messages.
>
> I'm not sure if the problem is in the driver, the firmware, the network,
> or in the daemon. What do you think it could be?
my guess is the driver is having difficulty reading the PHY register, see
below...
> This is the message that the daemon returns:
> netmon: EthDriverQuery.cc(173): LINK_BAD:ioctl on eth0 returned 4416
>
> This is the code fragment from the daemon that is monitoring the link
> state for these NICs.
> LinkStateType EthDriverQuery::checkLinkState() {
> if (sock == -1)
> {
> design_log(SUBSYS_NAME, __FILE__, __LINE__,"Bad Socket returned");
> return ERROR;
> }
> if (ioctl(sock, SIOCGMIIREG, &ifr) < 0)
> {
> design_log(SUBSYS_NAME, __FILE__, __LINE__,
> "link state query on %s failed: %s\n",
> ifr.ifr_name, strerror(errno));
> design_log(SUBSYS_NAME, __FILE__, __LINE__,"LINK_BAD:ioctl on %s
> returned %u \n",ifr.ifr_name, data[3]);
> return ERROR;
> }
>
> //the link state register is returned in data[3]
> if ((data[3] & 0x0016) == 0x4)
> return LINK_GOOD;
> else
> {
> design_log(SUBSYS_NAME, __FILE__, __LINE__,"LINK_BAD:ioctl on %s
> returned %u \n",ifr.ifr_name, data[3]);
> return LINK_BAD;
> }
>
> So the code is looking for the MII REG, in particular, the link state
> register in the data[3]. It expects the linkup bit to be set, but the
> jabber and remote failure to not be set, and when the error occurs all
> three are set. The ioctl call itself succeeds, just
> the data is strange.
Hi Andy, interesting problem. Is there some reason to query the phy
directly instead of using a netlink socket to monitor the link state?
I suspect that the times when it fails we likely didn't get the semaphore
or something in the driver prevented the PHY read from completing (like a
race of some kind with another driver thread (maybe link check)
My guess is that the error propogation from the driver's failed PHY read
is not correct. Could you instrument the driver's reads and return codes
from the calls it makes?
Jesse
------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired