Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

Dale Ghent Wed, 18 Jan 2017 00:04:32 -0800

> On Jan 18, 2017, at 2:38 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:
> 
> Am 17.01.17 um 23:09 schrieb Dale Ghent:
>>> On Jan 17, 2017, at 2:39 PM, Stephan Budach <stephan.bud...@jvm.de>
>>>  wrote:
>>> 
>>> Am 17.01.17 um 17:37 schrieb Dale Ghent:
>>> 
>>>>> On Jan 17, 2017, at 11:31 AM, Stephan Budach <stephan.bud...@jvm.de>
>>>>> 
>>>>>  wrote:
>>>>> 
>>>>> Hi Dale,
>>>>> 
>>>>> Am 17.01.17 um 17:22 schrieb Dale Ghent:
>>>>> 
>>>>> 
>>>>>>> On Jan 17, 2017, at 11:12 AM, Stephan Budach <stephan.bud...@jvm.de>
>>>>>>> 
>>>>>>> 
>>>>>>>  wrote:
>>>>>>> 
>>>>>>> Hi guys,
>>>>>>> 
>>>>>>> I am sorry, but I do have to undig this old topic, since I do now have 
>>>>>>> three hosts running omniOS 018/020, which show these pesky  issues with 
>>>>>>> flapping their ixgbeN links on my Nexus FEXes…
>>>>>>> 
>>>>>>> Does anyone know, if there has any change been made to the ixgbe 
>>>>>>> drivers since 06/2016?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> Since June 2016? Yes! A large update to the ixgbe driver happened in 
>>>>>> August. This added X550 support, and also brought the Intel Shared Code 
>>>>>> it uses from its 2012 vintage up to current. The updated driver is 
>>>>>> available in 014 and later.
>>>>>> 
>>>>>> /dale
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> do you know of any option to get to know, why three of my boxes are 
>>>>> flapping their 10GbE ports? It's actually not only when in aggr mode, but 
>>>>> on single use as well. Last week I presumeably had one of my RSF-1 nodes 
>>>>> panic, since it couldn't get to it's iSCSI LUNs anymore. The thing ist, 
>>>>> that somewhere doen the line, the ixgbe driver seems to be fine, to 
>>>>> configure one port to 1GbE instead of 10GbE, which will stop the 
>>>>> flapping, but wich will break the VPC on my Nexus nevertheless.
>>>>> 
>>>>> In syslog, this looks like this:
>>>>> 
>>>>> ...
>>>>> Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
>>>>> link up, 1000 Mbps, full duplex
>>>>> Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
>>>>> link down
>>>>> Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
>>>>> link up, 10000 Mbps, full duplex
>>>>> Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
>>>>> link down
>>>>> Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
>>>>> link up, 10000 Mbps, full duplex
>>>>> Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
>>>>> link down
>>>>> Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
>>>>> link up, 10000 Mbps, full duplex
>>>>> Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
>>>>> link down
>>>>> Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
>>>>> link up, 10000 Mbps, full duplex
>>>>> Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
>>>>> link down
>>>>> Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
>>>>> link up, 10000 Mbps, full duplex
>>>>> Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
>>>>> link down
>>>>> Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
>>>>> link up, 10000 Mbps, full duplex
>>>>> Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
>>>>> link down
>>>>> 
>>>>> Note on 14:46:07, where the system settles on a 1GbE connection…
>>>>> 
>>>>> 
>>>> Sounds like a cabling issue? Are the runs too long or are you not using 
>>>> CAT6a? It flapping at 10Gb and then settling at 1Gb would indicate a 
>>>> cabling issue to me. The driver will always try to link at the fastest 
>>>> speed that the local controller and the remote peer will negotiate at... 
>>>> it will not proactively downgrade the link speed. If that happens, it is 
>>>> because that is what the controller managed to negotiate with the remote 
>>>> peer at.
>>>> 
>>>> Are you using jumbo frames or anything outside of a normal 1500mtu link?
>>>> 
>>>> /dale
>>>> 
>>>> 
>>> The cables are actually specifically purchased cat6 cables. They run about 
>>> 2m, not more. It could be tna cables, but I am running a couple of those 
>>> and afaik, I only get these issues on these three nodes. I can try some 
>>> other cables, but I hoped to be able to get maybe some kind of debug 
>>> messages from the driver.
>>> 
>> The chip provides no reason for a LoS or downgrade of the link. For 
>> configuration issues it interrupts only on a few things. "LSC" (Link Status 
>> Change) interrupts one of these things and are what tells the driver to 
>> interrogate the chip for its current speed, but beyond that, the hardware 
>> provides no further details. Any details regarding why the PHY had to 
>> re-train the link are completely hidden to the driver.
>> 
>> Are these X540 interfaces actually built into the motherboard, or are they 
>> separate PCIe cards? Also, CAT6 alone might not be enough, and even the 
>> magnetics on the older X540 might not even be able to eek out a 10Gb 
>> connection, even at 2m. I would remove all doubt of cabling being an issue 
>> by replacing them with CAT6a. Beware of cable vendors who sell CAT6 cables 
>> as "CAT6a". It could also be an issue with the modular jacks on the ends.
>> 
>> Since you mentioned "after 6/2016" for the ixgbe driver, have you tried the 
>> newer one yet? Large portions of it were re-written and re-factored, and 
>> many bugs fixed including portions that touch the X540 due to the new X550 
>> also being copper and the two models needing to share some logic related to 
>> that.
>> 
>> /dale
>> 
> Thanks for clarifying that. I just checked the cables and they classify as 
> Cat6a and they are from a respectable german vendor, not that this would be 
> any guarantee, but at least they're no bulkware from china. ;)
> 
> The X540s are either onboard on some Supermicro X10 boards, but also on a 
> genuine Intel PCI adaptor. I will check some other cables, maybe the ones I 
> got were somewhat faulty. However, this leaves only a few options  to the 
> user, finding out, what is actually wrong with the connection, isn't it?
> 
> Regarding the release of omniOS, I will update my RSF-1 node to the latest 
> r18, the other two new nodes are actually on r20 and thus should already have 
> the new driver installed.


I the ixgbe package installed on your systems has a time stamp after July 19 
2016, it will have the updated code.

Regarding the X540s which are integrated on some of your SMCI X10 boards, does 
a 10Gb link remain stable after you issue the following two commands in the 
shown order:

dladm set-linkprop -p en_10gfdx_cap=0 ixgbeN
dladm set-linkprop -p en_10gfdx_cap=1 ixgbeN

Also, check flowctrl:

dladm show-linkprop -p flowctrl

For your ixgbe devices, this should be the default of "no"

/dale
_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

Reply via email to