> On Jan 18, 2017, at 2:38 AM, Stephan Budach <stephan.bud...@jvm.de> wrote: > > Am 17.01.17 um 23:09 schrieb Dale Ghent: >>> On Jan 17, 2017, at 2:39 PM, Stephan Budach <stephan.bud...@jvm.de> >>> wrote: >>> >>> Am 17.01.17 um 17:37 schrieb Dale Ghent: >>> >>>>> On Jan 17, 2017, at 11:31 AM, Stephan Budach <stephan.bud...@jvm.de> >>>>> >>>>> wrote: >>>>> >>>>> Hi Dale, >>>>> >>>>> Am 17.01.17 um 17:22 schrieb Dale Ghent: >>>>> >>>>> >>>>>>> On Jan 17, 2017, at 11:12 AM, Stephan Budach <stephan.bud...@jvm.de> >>>>>>> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> Hi guys, >>>>>>> >>>>>>> I am sorry, but I do have to undig this old topic, since I do now have >>>>>>> three hosts running omniOS 018/020, which show these pesky issues with >>>>>>> flapping their ixgbeN links on my Nexus FEXes… >>>>>>> >>>>>>> Does anyone know, if there has any change been made to the ixgbe >>>>>>> drivers since 06/2016? >>>>>>> >>>>>>> >>>>>>> >>>>>> Since June 2016? Yes! A large update to the ixgbe driver happened in >>>>>> August. This added X550 support, and also brought the Intel Shared Code >>>>>> it uses from its 2012 vintage up to current. The updated driver is >>>>>> available in 014 and later. >>>>>> >>>>>> /dale >>>>>> >>>>>> >>>>>> >>>>> do you know of any option to get to know, why three of my boxes are >>>>> flapping their 10GbE ports? It's actually not only when in aggr mode, but >>>>> on single use as well. Last week I presumeably had one of my RSF-1 nodes >>>>> panic, since it couldn't get to it's iSCSI LUNs anymore. The thing ist, >>>>> that somewhere doen the line, the ixgbe driver seems to be fine, to >>>>> configure one port to 1GbE instead of 10GbE, which will stop the >>>>> flapping, but wich will break the VPC on my Nexus nevertheless. >>>>> >>>>> In syslog, this looks like this: >>>>> >>>>> ... >>>>> Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 >>>>> link up, 1000 Mbps, full duplex >>>>> Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 >>>>> link down >>>>> Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 >>>>> link up, 10000 Mbps, full duplex >>>>> Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 >>>>> link down >>>>> Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 >>>>> link up, 10000 Mbps, full duplex >>>>> Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 >>>>> link down >>>>> Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 >>>>> link up, 10000 Mbps, full duplex >>>>> Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 >>>>> link down >>>>> Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 >>>>> link up, 10000 Mbps, full duplex >>>>> Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 >>>>> link down >>>>> Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 >>>>> link up, 10000 Mbps, full duplex >>>>> Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 >>>>> link down >>>>> Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 >>>>> link up, 10000 Mbps, full duplex >>>>> Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 >>>>> link down >>>>> >>>>> Note on 14:46:07, where the system settles on a 1GbE connection… >>>>> >>>>> >>>> Sounds like a cabling issue? Are the runs too long or are you not using >>>> CAT6a? It flapping at 10Gb and then settling at 1Gb would indicate a >>>> cabling issue to me. The driver will always try to link at the fastest >>>> speed that the local controller and the remote peer will negotiate at... >>>> it will not proactively downgrade the link speed. If that happens, it is >>>> because that is what the controller managed to negotiate with the remote >>>> peer at. >>>> >>>> Are you using jumbo frames or anything outside of a normal 1500mtu link? >>>> >>>> /dale >>>> >>>> >>> The cables are actually specifically purchased cat6 cables. They run about >>> 2m, not more. It could be tna cables, but I am running a couple of those >>> and afaik, I only get these issues on these three nodes. I can try some >>> other cables, but I hoped to be able to get maybe some kind of debug >>> messages from the driver. >>> >> The chip provides no reason for a LoS or downgrade of the link. For >> configuration issues it interrupts only on a few things. "LSC" (Link Status >> Change) interrupts one of these things and are what tells the driver to >> interrogate the chip for its current speed, but beyond that, the hardware >> provides no further details. Any details regarding why the PHY had to >> re-train the link are completely hidden to the driver. >> >> Are these X540 interfaces actually built into the motherboard, or are they >> separate PCIe cards? Also, CAT6 alone might not be enough, and even the >> magnetics on the older X540 might not even be able to eek out a 10Gb >> connection, even at 2m. I would remove all doubt of cabling being an issue >> by replacing them with CAT6a. Beware of cable vendors who sell CAT6 cables >> as "CAT6a". It could also be an issue with the modular jacks on the ends. >> >> Since you mentioned "after 6/2016" for the ixgbe driver, have you tried the >> newer one yet? Large portions of it were re-written and re-factored, and >> many bugs fixed including portions that touch the X540 due to the new X550 >> also being copper and the two models needing to share some logic related to >> that. >> >> /dale >> > Thanks for clarifying that. I just checked the cables and they classify as > Cat6a and they are from a respectable german vendor, not that this would be > any guarantee, but at least they're no bulkware from china. ;) > > The X540s are either onboard on some Supermicro X10 boards, but also on a > genuine Intel PCI adaptor. I will check some other cables, maybe the ones I > got were somewhat faulty. However, this leaves only a few options to the > user, finding out, what is actually wrong with the connection, isn't it? > > Regarding the release of omniOS, I will update my RSF-1 node to the latest > r18, the other two new nodes are actually on r20 and thus should already have > the new driver installed.
I the ixgbe package installed on your systems has a time stamp after July 19 2016, it will have the updated code. Regarding the X540s which are integrated on some of your SMCI X10 boards, does a 10Gb link remain stable after you issue the following two commands in the shown order: dladm set-linkprop -p en_10gfdx_cap=0 ixgbeN dladm set-linkprop -p en_10gfdx_cap=1 ixgbeN Also, check flowctrl: dladm show-linkprop -p flowctrl For your ixgbe devices, this should be the default of "no" /dale _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss