Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Hi, after having messed around with quite a lot of Cat6a cables from different vendors, it seems that the 2m runs are too short. Maybe the Intel controllers are providing too much power to the jacks. I have now exchanged the 2m/3m cables with 5m ones and this seems to do the trick. I know, that there will be some crosstalk at the cables' either ends and I learned that even sophisticated measurement equipment will discard the results for the first 2m of a Cat6a 10GbE connection. However, having to deal with 5m runs is anything but convinient, if you do have quite a number of cables to deploy. I am still monitoring the NICs, though… ;) Cheers, Stephan smime.p7s Description: S/MIME cryptographic signature ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
> On Jan 19, 2017, at 4:27 AM, Stephan Budach wrote: > > Am 18.01.17 um 17:38 schrieb Stephan Budach: >> Am 18.01.17 um 17:32 schrieb Dan McDonald: >>> Generally the X540 has had a good track record. I brought up the support >>> for this a long time ago, and it worked alright then. I think Dale has an >>> X540 in-house which works fine too (he should confirm this). >>> >>> Some other things to check: >>> >>> * Is your BIOS set to map the PCI-E space into the low-32 bits only? >>> That's an illumos limitation. >>> >>> * Do you have other known-working 10GigBaseT chips to try? >>> >>> Dan >>> >>> >> I will check with the BIOS, altough I thought that this option would simply >> cause PCI adaptors to vanish from the system, if setup that way. >> Actually, I have been going with Intel all the time and it has been up to >> the X540 in 10GbE setups only, when I ever startet to experience issues at >> all, so Intel has been a natural choice for me ever… ;) >> >> Stephan > I just checked the BIOS of my new Supermicros and I think that this is the > BIOS option you were referring to… > > Above 4G Decoding: DISABLED > > So, this should be right, shouldn't it? Correct, currently it should be disabled (however we hope by the time 022 is released, there will not be a reason to have that disabled) /dale ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Am 18.01.17 um 17:38 schrieb Stephan Budach: Am 18.01.17 um 17:32 schrieb Dan McDonald: Generally the X540 has had a good track record. I brought up the support for this a long time ago, and it worked alright then. I think Dale has an X540 in-house which works fine too (he should confirm this). Some other things to check: * Is your BIOS set to map the PCI-E space into the low-32 bits only? That's an illumos limitation. * Do you have other known-working 10GigBaseT chips to try? Dan I will check with the BIOS, altough I thought that this option would simply cause PCI adaptors to vanish from the system, if setup that way. Actually, I have been going with Intel all the time and it has been up to the X540 in 10GbE setups only, when I ever startet to experience issues at all, so Intel has been a natural choice for me ever… ;) Stephan I just checked the BIOS of my new Supermicros and I think that this is the BIOS option you were referring to… Above 4G Decoding: DISABLED So, this should be right, shouldn't it? Stephan smime.p7s Description: S/MIME cryptographic signature ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Am 18.01.17 um 17:32 schrieb Dan McDonald: Generally the X540 has had a good track record. I brought up the support for this a long time ago, and it worked alright then. I think Dale has an X540 in-house which works fine too (he should confirm this). Some other things to check: * Is your BIOS set to map the PCI-E space into the low-32 bits only? That's an illumos limitation. * Do you have other known-working 10GigBaseT chips to try? Dan I will check with the BIOS, altough I thought that this option would simply cause PCI adaptors to vanish from the system, if setup that way. Actually, I have been going with Intel all the time and it has been up to the X540 in 10GbE setups only, when I ever startet to experience issues at all, so Intel has been a natural choice for me ever… ;) Stephan smime.p7s Description: S/MIME cryptographic signature ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Generally the X540 has had a good track record. I brought up the support for this a long time ago, and it worked alright then. I think Dale has an X540 in-house which works fine too (he should confirm this). Some other things to check: * Is your BIOS set to map the PCI-E space into the low-32 bits only? That's an illumos limitation. * Do you have other known-working 10GigBaseT chips to try? Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Am 18.01.17 um 09:01 schrieb Dale Ghent: On Jan 18, 2017, at 2:38 AM, Stephan Budach wrote: Am 17.01.17 um 23:09 schrieb Dale Ghent: On Jan 17, 2017, at 2:39 PM, Stephan Budach wrote: Am 17.01.17 um 17:37 schrieb Dale Ghent: On Jan 17, 2017, at 11:31 AM, Stephan Budach wrote: Hi Dale, Am 17.01.17 um 17:22 schrieb Dale Ghent: On Jan 17, 2017, at 11:12 AM, Stephan Budach wrote: Hi guys, I am sorry, but I do have to undig this old topic, since I do now have three hosts running omniOS 018/020, which show these pesky issues with flapping their ixgbeN links on my Nexus FEXes… Does anyone know, if there has any change been made to the ixgbe drivers since 06/2016? Since June 2016? Yes! A large update to the ixgbe driver happened in August. This added X550 support, and also brought the Intel Shared Code it uses from its 2012 vintage up to current. The updated driver is available in 014 and later. /dale do you know of any option to get to know, why three of my boxes are flapping their 10GbE ports? It's actually not only when in aggr mode, but on single use as well. Last week I presumeably had one of my RSF-1 nodes panic, since it couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere doen the line, the ixgbe driver seems to be fine, to configure one port to 1GbE instead of 10GbE, which will stop the flapping, but wich will break the VPC on my Nexus nevertheless. In syslog, this looks like this: ... Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1000 Mbps, full duplex Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Note on 14:46:07, where the system settles on a 1GbE connection… Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to me. The driver will always try to link at the fastest speed that the local controller and the remote peer will negotiate at... it will not proactively downgrade the link speed. If that happens, it is because that is what the controller managed to negotiate with the remote peer at. Are you using jumbo frames or anything outside of a normal 1500mtu link? /dale The cables are actually specifically purchased cat6 cables. They run about 2m, not more. It could be tna cables, but I am running a couple of those and afaik, I only get these issues on these three nodes. I can try some other cables, but I hoped to be able to get maybe some kind of debug messages from the driver. The chip provides no reason for a LoS or downgrade of the link. For configuration issues it interrupts only on a few things. "LSC" (Link Status Change) interrupts one of these things and are what tells the driver to interrogate the chip for its current speed, but beyond that, the hardware provides no further details. Any details regarding why the PHY had to re-train the link are completely hidden to the driver. Are these X540 interfaces actually built into the motherboard, or are they separate PCIe cards? Also, CAT6 alone might not be enough, and even the magnetics on the older X540 might not even be able to eek out a 10Gb connection, even at 2m. I would remove all doubt of cabling being an issue by replacing them with CAT6a. Beware of cable vendors who sell CAT6 cables as "CAT6a". It could also be an issue with the modular jacks on the ends. Since you mentioned "after 6/2016" for the ixgbe driver, have you tried the newer one yet? Large portions of it were re-written and re-factored, and many bugs fixed including portions that touch the X540 due to the new X550 also being copper and the two models needing to share some logic related to that. /dale Thanks for clarifying that. I just checked the cables and they classify as Cat6a and they are from a respectable german vendor, n
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Thanks for clarifying that. I just checked the cables and they classify as Cat6a and they are from a respectable german vendor, not that this would be any guarantee, but at least they're no bulkware from china. ;) The X540s are either onboard on some Supermicro X10 boards, but also on a genuine Intel PCI adaptor. I will check some other cables, maybe the ones I got were somewhat faulty. However, this leaves only a few options to the user, finding out, what is actually wrong with the connection, isn't it? Regarding the release of omniOS, I will update my RSF-1 node to the latest r18, the other two new nodes are actually on r20 and thus should already have the new driver installed. …any suggestion on some good cables? ;) Dätwyler: http://www.cabling.datwyler.com/products/data-centres/copper-technology/patch-cords/product/rj45-patch-cords-cat6a-iec.html they provide detailed specs and test certificates for all of their cable types. Thanks, Stephan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
> On Jan 18, 2017, at 2:38 AM, Stephan Budach wrote: > > Am 17.01.17 um 23:09 schrieb Dale Ghent: >>> On Jan 17, 2017, at 2:39 PM, Stephan Budach >>> wrote: >>> >>> Am 17.01.17 um 17:37 schrieb Dale Ghent: >>> > On Jan 17, 2017, at 11:31 AM, Stephan Budach > > wrote: > > Hi Dale, > > Am 17.01.17 um 17:22 schrieb Dale Ghent: > > >>> On Jan 17, 2017, at 11:12 AM, Stephan Budach >>> >>> >>> wrote: >>> >>> Hi guys, >>> >>> I am sorry, but I do have to undig this old topic, since I do now have >>> three hosts running omniOS 018/020, which show these pesky issues with >>> flapping their ixgbeN links on my Nexus FEXes… >>> >>> Does anyone know, if there has any change been made to the ixgbe >>> drivers since 06/2016? >>> >>> >>> >> Since June 2016? Yes! A large update to the ixgbe driver happened in >> August. This added X550 support, and also brought the Intel Shared Code >> it uses from its 2012 vintage up to current. The updated driver is >> available in 014 and later. >> >> /dale >> >> >> > do you know of any option to get to know, why three of my boxes are > flapping their 10GbE ports? It's actually not only when in aggr mode, but > on single use as well. Last week I presumeably had one of my RSF-1 nodes > panic, since it couldn't get to it's iSCSI LUNs anymore. The thing ist, > that somewhere doen the line, the ixgbe driver seems to be fine, to > configure one port to 1GbE instead of 10GbE, which will stop the > flapping, but wich will break the VPC on my Nexus nevertheless. > > In syslog, this looks like this: > > ... > Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 > link up, 1000 Mbps, full duplex > Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 > link down > Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 > link up, 1 Mbps, full duplex > Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 > link down > Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 > link up, 1 Mbps, full duplex > Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 > link down > Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 > link up, 1 Mbps, full duplex > Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 > link down > Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 > link up, 1 Mbps, full duplex > Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 > link down > Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 > link up, 1 Mbps, full duplex > Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 > link down > Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 > link up, 1 Mbps, full duplex > Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 > link down > > Note on 14:46:07, where the system settles on a 1GbE connection… > > Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to me. The driver will always try to link at the fastest speed that the local controller and the remote peer will negotiate at... it will not proactively downgrade the link speed. If that happens, it is because that is what the controller managed to negotiate with the remote peer at. Are you using jumbo frames or anything outside of a normal 1500mtu link? /dale >>> The cables are actually specifically purchased cat6 cables. They run about >>> 2m, not more. It could be tna cables, but I am running a couple of those >>> and afaik, I only get these issues on these three nodes. I can try some >>> other cables, but I hoped to be able to get maybe some kind of debug >>> messages from the driver. >>> >> The chip provides no reason for a LoS or downgrade of the link. For >> configuration issues it interrupts only on a few things. "LSC" (Link Status >> Change) interrupts one of these things and are what tells the driver to >> interrogate the chip for its current speed, but beyond that, the hardware >> provides no further details. Any details regarding why the PHY had to >> re-train the link are completely hidden to the driver. >> >> Are these X540 interfaces actually built into the motherboard, or are they >> separate PCIe cards? Also, CAT6 alone might not be enough, and even the >> magnetics on the older X540 might not even be able to eek out a 10Gb >> connection, even at 2m. I would remove all doubt of cabling being a
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Am 17.01.17 um 23:09 schrieb Dale Ghent: On Jan 17, 2017, at 2:39 PM, Stephan Budach wrote: Am 17.01.17 um 17:37 schrieb Dale Ghent: On Jan 17, 2017, at 11:31 AM, Stephan Budach wrote: Hi Dale, Am 17.01.17 um 17:22 schrieb Dale Ghent: On Jan 17, 2017, at 11:12 AM, Stephan Budach wrote: Hi guys, I am sorry, but I do have to undig this old topic, since I do now have three hosts running omniOS 018/020, which show these pesky issues with flapping their ixgbeN links on my Nexus FEXes… Does anyone know, if there has any change been made to the ixgbe drivers since 06/2016? Since June 2016? Yes! A large update to the ixgbe driver happened in August. This added X550 support, and also brought the Intel Shared Code it uses from its 2012 vintage up to current. The updated driver is available in 014 and later. /dale do you know of any option to get to know, why three of my boxes are flapping their 10GbE ports? It's actually not only when in aggr mode, but on single use as well. Last week I presumeably had one of my RSF-1 nodes panic, since it couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere doen the line, the ixgbe driver seems to be fine, to configure one port to 1GbE instead of 10GbE, which will stop the flapping, but wich will break the VPC on my Nexus nevertheless. In syslog, this looks like this: ... Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1000 Mbps, full duplex Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Note on 14:46:07, where the system settles on a 1GbE connection… Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to me. The driver will always try to link at the fastest speed that the local controller and the remote peer will negotiate at... it will not proactively downgrade the link speed. If that happens, it is because that is what the controller managed to negotiate with the remote peer at. Are you using jumbo frames or anything outside of a normal 1500mtu link? /dale The cables are actually specifically purchased cat6 cables. They run about 2m, not more. It could be tna cables, but I am running a couple of those and afaik, I only get these issues on these three nodes. I can try some other cables, but I hoped to be able to get maybe some kind of debug messages from the driver. The chip provides no reason for a LoS or downgrade of the link. For configuration issues it interrupts only on a few things. "LSC" (Link Status Change) interrupts one of these things and are what tells the driver to interrogate the chip for its current speed, but beyond that, the hardware provides no further details. Any details regarding why the PHY had to re-train the link are completely hidden to the driver. Are these X540 interfaces actually built into the motherboard, or are they separate PCIe cards? Also, CAT6 alone might not be enough, and even the magnetics on the older X540 might not even be able to eek out a 10Gb connection, even at 2m. I would remove all doubt of cabling being an issue by replacing them with CAT6a. Beware of cable vendors who sell CAT6 cables as "CAT6a". It could also be an issue with the modular jacks on the ends. Since you mentioned "after 6/2016" for the ixgbe driver, have you tried the newer one yet? Large portions of it were re-written and re-factored, and many bugs fixed including portions that touch the X540 due to the new X550 also being copper and the two models needing to share some logic related to that. /dale Thanks for clarifying that. I just checked the cables and they classify as Cat6a and they are from a respectable german vendor, not that this would be any guarantee, but at least they're no bulkware from china. ;) The X540s are either
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
> On Jan 17, 2017, at 2:39 PM, Stephan Budach wrote: > > Am 17.01.17 um 17:37 schrieb Dale Ghent: >>> On Jan 17, 2017, at 11:31 AM, Stephan Budach >>> wrote: >>> >>> Hi Dale, >>> >>> Am 17.01.17 um 17:22 schrieb Dale Ghent: >>> > On Jan 17, 2017, at 11:12 AM, Stephan Budach > > wrote: > > Hi guys, > > I am sorry, but I do have to undig this old topic, since I do now have > three hosts running omniOS 018/020, which show these pesky issues with > flapping their ixgbeN links on my Nexus FEXes… > > Does anyone know, if there has any change been made to the ixgbe drivers > since 06/2016? > > Since June 2016? Yes! A large update to the ixgbe driver happened in August. This added X550 support, and also brought the Intel Shared Code it uses from its 2012 vintage up to current. The updated driver is available in 014 and later. /dale >>> do you know of any option to get to know, why three of my boxes are >>> flapping their 10GbE ports? It's actually not only when in aggr mode, but >>> on single use as well. Last week I presumeably had one of my RSF-1 nodes >>> panic, since it couldn't get to it's iSCSI LUNs anymore. The thing ist, >>> that somewhere doen the line, the ixgbe driver seems to be fine, to >>> configure one port to 1GbE instead of 10GbE, which will stop the flapping, >>> but wich will break the VPC on my Nexus nevertheless. >>> >>> In syslog, this looks like this: >>> >>> ... >>> Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link >>> up, 1000 Mbps, full duplex >>> Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link >>> down >>> Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link >>> up, 1 Mbps, full duplex >>> Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link >>> down >>> Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link >>> up, 1 Mbps, full duplex >>> Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link >>> down >>> Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link >>> up, 1 Mbps, full duplex >>> Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link >>> down >>> Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link >>> up, 1 Mbps, full duplex >>> Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link >>> down >>> Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link >>> up, 1 Mbps, full duplex >>> Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link >>> down >>> Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link >>> up, 1 Mbps, full duplex >>> Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link >>> down >>> >>> Note on 14:46:07, where the system settles on a 1GbE connection… >>> >> Sounds like a cabling issue? Are the runs too long or are you not using >> CAT6a? It flapping at 10Gb and then settling at 1Gb would indicate a cabling >> issue to me. The driver will always try to link at the fastest speed that >> the local controller and the remote peer will negotiate at... it will not >> proactively downgrade the link speed. If that happens, it is because that is >> what the controller managed to negotiate with the remote peer at. >> >> Are you using jumbo frames or anything outside of a normal 1500mtu link? >> >> /dale >> > The cables are actually specifically purchased cat6 cables. They run about > 2m, not more. It could be tna cables, but I am running a couple of those and > afaik, I only get these issues on these three nodes. I can try some other > cables, but I hoped to be able to get maybe some kind of debug messages from > the driver. The chip provides no reason for a LoS or downgrade of the link. For configuration issues it interrupts only on a few things. "LSC" (Link Status Change) interrupts one of these things and are what tells the driver to interrogate the chip for its current speed, but beyond that, the hardware provides no further details. Any details regarding why the PHY had to re-train the link are completely hidden to the driver. Are these X540 interfaces actually built into the motherboard, or are they separate PCIe cards? Also, CAT6 alone might not be enough, and even the magnetics on the older X540 might not even be able to eek out a 10Gb connection, even at 2m. I would remove all doubt of cabling being an issue by replacing them with CAT6a. Beware of cable vendors who sell CAT6 cables as "CAT6a". It could also be an issue with the modular jacks on the ends. Since you mentioned "after 6/2016" for the ixgbe driver, have you tried the newer one yet? Large portions of it were re-written and re-factored, and many bugs fixed including portions that touch the X540 due to the new X550
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
On Tue, 17 Jan 2017 20:39:49 +0100 Stephan Budach wrote: > The cables are actually specifically purchased cat6 cables. They run about > 2m, not more. It could be tna cables, but I am running a couple of those and > afaik, I only get these issues on these three nodes. I can try some other > cables, but I hoped to be able to get maybe some kind of debug messages from > the driver. > Should 10Gb not use cat 6a? BTW. Have you tried various settings for buffers and hardware offload? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -- /usr/games/fortune -es says: The early bird gets the coffee left over from the night before. pgpqmQvoY6Gek.pgp Description: OpenPGP digital signature ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Am 17.01.17 um 17:37 schrieb Dale Ghent: On Jan 17, 2017, at 11:31 AM, Stephan Budach wrote: Hi Dale, Am 17.01.17 um 17:22 schrieb Dale Ghent: On Jan 17, 2017, at 11:12 AM, Stephan Budach wrote: Hi guys, I am sorry, but I do have to undig this old topic, since I do now have three hosts running omniOS 018/020, which show these pesky issues with flapping their ixgbeN links on my Nexus FEXes… Does anyone know, if there has any change been made to the ixgbe drivers since 06/2016? Since June 2016? Yes! A large update to the ixgbe driver happened in August. This added X550 support, and also brought the Intel Shared Code it uses from its 2012 vintage up to current. The updated driver is available in 014 and later. /dale do you know of any option to get to know, why three of my boxes are flapping their 10GbE ports? It's actually not only when in aggr mode, but on single use as well. Last week I presumeably had one of my RSF-1 nodes panic, since it couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere doen the line, the ixgbe driver seems to be fine, to configure one port to 1GbE instead of 10GbE, which will stop the flapping, but wich will break the VPC on my Nexus nevertheless. In syslog, this looks like this: ... Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1000 Mbps, full duplex Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Note on 14:46:07, where the system settles on a 1GbE connection… Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to me. The driver will always try to link at the fastest speed that the local controller and the remote peer will negotiate at... it will not proactively downgrade the link speed. If that happens, it is because that is what the controller managed to negotiate with the remote peer at. Are you using jumbo frames or anything outside of a normal 1500mtu link? /dale The cables are actually specifically purchased cat6 cables. They run about 2m, not more. It could be tna cables, but I am running a couple of those and afaik, I only get these issues on these three nodes. I can try some other cables, but I hoped to be able to get maybe some kind of debug messages from the driver. Thanks, Stephan smime.p7s Description: S/MIME cryptographic signature ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
> On Jan 17, 2017, at 11:31 AM, Stephan Budach wrote: > > Hi Dale, > > Am 17.01.17 um 17:22 schrieb Dale Ghent: >>> On Jan 17, 2017, at 11:12 AM, Stephan Budach >>> wrote: >>> >>> Hi guys, >>> >>> I am sorry, but I do have to undig this old topic, since I do now have >>> three hosts running omniOS 018/020, which show these pesky issues with >>> flapping their ixgbeN links on my Nexus FEXes… >>> >>> Does anyone know, if there has any change been made to the ixgbe drivers >>> since 06/2016? >>> >> Since June 2016? Yes! A large update to the ixgbe driver happened in August. >> This added X550 support, and also brought the Intel Shared Code it uses from >> its 2012 vintage up to current. The updated driver is available in 014 and >> later. >> >> /dale >> > > do you know of any option to get to know, why three of my boxes are flapping > their 10GbE ports? It's actually not only when in aggr mode, but on single > use as well. Last week I presumeably had one of my RSF-1 nodes panic, since > it couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere > doen the line, the ixgbe driver seems to be fine, to configure one port to > 1GbE instead of 10GbE, which will stop the flapping, but wich will break the > VPC on my Nexus nevertheless. > > In syslog, this looks like this: > > ... > Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link > up, 1000 Mbps, full duplex > Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link > down > Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link > up, 1 Mbps, full duplex > Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link > down > Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link > up, 1 Mbps, full duplex > Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link > down > Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link > up, 1 Mbps, full duplex > Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link > down > Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link > up, 1 Mbps, full duplex > Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link > down > Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link > up, 1 Mbps, full duplex > Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link > down > Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link > up, 1 Mbps, full duplex > Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link > down > > Note on 14:46:07, where the system settles on a 1GbE connection… Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to me. The driver will always try to link at the fastest speed that the local controller and the remote peer will negotiate at... it will not proactively downgrade the link speed. If that happens, it is because that is what the controller managed to negotiate with the remote peer at. Are you using jumbo frames or anything outside of a normal 1500mtu link? /dale ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
I'd check your switch, though you're using 10GigBaseT, which shouldn't be as big of a problem. Hmmm, using cat6 or better cables? 5e isn't going to cut it for reliable 10Gig service. Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Hi Dale, Am 17.01.17 um 17:22 schrieb Dale Ghent: On Jan 17, 2017, at 11:12 AM, Stephan Budach wrote: Hi guys, I am sorry, but I do have to undig this old topic, since I do now have three hosts running omniOS 018/020, which show these pesky issues with flapping their ixgbeN links on my Nexus FEXes… Does anyone know, if there has any change been made to the ixgbe drivers since 06/2016? Since June 2016? Yes! A large update to the ixgbe driver happened in August. This added X550 support, and also brought the Intel Shared Code it uses from its 2012 vintage up to current. The updated driver is available in 014 and later. /dale do you know of any option to get to know, why three of my boxes are flapping their 10GbE ports? It's actually not only when in aggr mode, but on single use as well. Last week I presumeably had one of my RSF-1 nodes panic, since it couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere doen the line, the ixgbe driver seems to be fine, to configure one port to 1GbE instead of 10GbE, which will stop the flapping, but wich will break the VPC on my Nexus nevertheless. In syslog, this looks like this: Jan 17 14:41:51 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:42:11 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:43:33 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 link down Jan 17 14:43:33 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1 Mbps, full duplex Jan 17 14:43:34 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 link down Jan 17 14:43:43 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1 Mbps, full duplex Jan 17 14:44:05 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 link down Jan 17 14:44:10 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1 Mbps, full duplex Jan 17 14:45:14 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 link down Jan 17 14:45:14 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1 Mbps, full duplex Jan 17 14:45:14 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 link down Jan 17 14:45:29 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1 Mbps, full duplex Jan 17 14:45:29 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 link down Jan 17 14:45:29 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1 Mbps, full duplex Jan 17 14:45:29 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 link down Jan 17 14:45:40 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:45:45 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:45:51 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:45:51 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:45:52 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:45:56 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1000 Mbps, full duplex Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 1 Mbps, full duplex Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down Note on 14:46:07, where the system settles on a 1GbE connection… Thanks, Stephan smime.p7s Description: S/MIME cryptographic signature ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
> On Jan 17, 2017, at 11:12 AM, Stephan Budach wrote: > > Hi guys, > > I am sorry, but I do have to undig this old topic, since I do now have three > hosts running omniOS 018/020, which show these pesky issues with flapping > their ixgbeN links on my Nexus FEXes… X550 support went in June of 2016. Dale knows more about ixgbe than I do, since he did the work for X550. Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
> On Jan 17, 2017, at 11:12 AM, Stephan Budach wrote: > > Hi guys, > > I am sorry, but I do have to undig this old topic, since I do now have three > hosts running omniOS 018/020, which show these pesky issues with flapping > their ixgbeN links on my Nexus FEXes… > > Does anyone know, if there has any change been made to the ixgbe drivers > since 06/2016? Since June 2016? Yes! A large update to the ixgbe driver happened in August. This added X550 support, and also brought the Intel Shared Code it uses from its 2012 vintage up to current. The updated driver is available in 014 and later. /dale ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Hi guys, I am sorry, but I do have to undig this old topic, since I do now have three hosts running omniOS 018/020, which show these pesky issues with flapping their ixgbeN links on my Nexus FEXes… Does anyone know, if there has any change been made to the ixgbe drivers since 06/2016? Thanks, Stephan smime.p7s Description: S/MIME cryptographic signature ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Am 03.06.16 um 15:42 schrieb Fábio Rabelo: Hi to all A question: This are the board you used ? https://www.supermicro.com/products/motherboard/Xeon/C600/X10DRi-T4_.cfm If so, this board uses Intel X540, and this issue are only with Intel X550 chips ! Fábio Rabelo Yes, this is the board I got. Actually, it's a X10DRi-T4+ Cheers, Stephan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Hi to all A question: This are the board you used ? https://www.supermicro.com/products/motherboard/Xeon/C600/X10DRi-T4_.cfm If so, this board uses Intel X540, and this issue are only with Intel X550 chips ! Fábio Rabelo 2016-06-03 10:20 GMT-03:00 Stephan Budach : > Hi Dale, > > Am 17.05.16 um 20:55 schrieb Dale Ghent: > > On May 17, 2016, at 8:30 AM, Stephan Budach wrote: > > I have checked all of my ixgbe interfaces and they all report that now flow > controll is in place, as you can see: > > root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe0 > LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE > ixgbe0 flowctrlrw no no no,tx,rx,bi > root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe1 > LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE > ixgbe1 flowctrlrw no no no,tx,rx,bi > root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe2 > LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE > ixgbe2 flowctrlrw no no no,tx,rx,bi > root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe3 > LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE > ixgbe3 flowctrlrw no no no,tx,rx,bi > > I then checked the ports on the Nexus switches and found out, that they do > have outbound-flowcontrol enabled, but that is the case on any of those > Nexus ports, including those, where this issue doesn't exist. > > Optimally you would have flow control turned off on both sides, as the > switch still expects the ixgbe NIC to respond appropriately. To be honest, > the only time to use ethernet flow control is if you are operating the > interfaces for higher-level protocols which do not provide any sort of > direct flow control themselves, such as FCoE. If the vast majority of > traffic is TCP, leave it to the TCP stack to manage any local congestion on > the link. > > /dale > > I just wanted to wrap this up… I recently swapped that old Sun server with a > new Supermicro X10-type, which has 4 10 GbE NICs on board, installed OmniOS > r018 and my RSF-1 cluster software on it. Configured my two LACP > aggregations and there hasn't been any issue since. > So, either it's something on the old server - it's a Sun Fire X4170M2 - or > something on the Intel cards. > > Cheers, > Stephan > > > ___ > OmniOS-discuss mailing list > OmniOS-discuss@lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Hi Dale, Am 17.05.16 um 20:55 schrieb Dale Ghent: On May 17, 2016, at 8:30 AM, Stephan Budach wrote: I have checked all of my ixgbe interfaces and they all report that now flow controll is in place, as you can see: root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe0 LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE ixgbe0 flowctrlrw no no no,tx,rx,bi root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe1 LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE ixgbe1 flowctrlrw no no no,tx,rx,bi root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe2 LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE ixgbe2 flowctrlrw no no no,tx,rx,bi root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe3 LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE ixgbe3 flowctrlrw no no no,tx,rx,bi I then checked the ports on the Nexus switches and found out, that they do have outbound-flowcontrol enabled, but that is the case on any of those Nexus ports, including those, where this issue doesn't exist. Optimally you would have flow control turned off on both sides, as the switch still expects the ixgbe NIC to respond appropriately. To be honest, the only time to use ethernet flow control is if you are operating the interfaces for higher-level protocols which do not provide any sort of direct flow control themselves, such as FCoE. If the vast majority of traffic is TCP, leave it to the TCP stack to manage any local congestion on the link. /dale I just wanted to wrap this up… I recently swapped that old Sun server with a new Supermicro X10-type, which has 4 10 GbE NICs on board, installed OmniOS r018 and my RSF-1 cluster software on it. Configured my two LACP aggregations and there hasn't been any issue since. So, either it's something on the old server - it's a Sun Fire X4170M2 - or something on the Intel cards. Cheers, Stephan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
On May 17, 2016, at 8:30 AM, Stephan Budach wrote: > I have checked all of my ixgbe interfaces and they all report that now flow > controll is in place, as you can see: > > root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe0 > LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE > ixgbe0 flowctrlrw no no no,tx,rx,bi > root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe1 > LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE > ixgbe1 flowctrlrw no no no,tx,rx,bi > root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe2 > LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE > ixgbe2 flowctrlrw no no no,tx,rx,bi > root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe3 > LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE > ixgbe3 flowctrlrw no no no,tx,rx,bi > > I then checked the ports on the Nexus switches and found out, that they do > have outbound-flowcontrol enabled, but that is the case on any of those Nexus > ports, including those, where this issue doesn't exist. Optimally you would have flow control turned off on both sides, as the switch still expects the ixgbe NIC to respond appropriately. To be honest, the only time to use ethernet flow control is if you are operating the interfaces for higher-level protocols which do not provide any sort of direct flow control themselves, such as FCoE. If the vast majority of traffic is TCP, leave it to the TCP stack to manage any local congestion on the link. /dale signature.asc Description: Message signed with OpenPGP using GPGMail ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Am 11.05.16 um 19:28 schrieb Dale Ghent: On May 11, 2016, at 12:32 PM, Stephan Budach wrote: I will try to get one node free of all services running on it, as I will have to reboot the system, since I will have to change the ixgbe.conf, haven't I? This is a RSF-1 host, so this will likely be done over the weekend. You can use dladm on a live system: dladm set-linkprop -p flowctrl=no ixgbeN Where ixgbeN is your ixgbe interfaces (probably ixgbe0 and ixgbe1) /dale I have checked all of my ixgbe interfaces and they all report that now flow controll is in place, as you can see: root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe0 LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE ixgbe0 flowctrlrw no no no,tx,rx,bi root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe1 LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE ixgbe1 flowctrlrw no no no,tx,rx,bi root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe2 LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE ixgbe2 flowctrlrw no no no,tx,rx,bi root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe3 LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE ixgbe3 flowctrlrw no no no,tx,rx,bi I then checked the ports on the Nexus switches and found out, that they do have outbound-flowcontrol enabled, but that is the case on any of those Nexus ports, including those, where this issue doesn't exist. Regards, Stephan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
> On May 11, 2016, at 12:32 PM, Stephan Budach wrote: > I will try to get one node free of all services running on it, as I will have > to reboot the system, since I will have to change the ixgbe.conf, haven't I? > This is a RSF-1 host, so this will likely be done over the weekend. You can use dladm on a live system: dladm set-linkprop -p flowctrl=no ixgbeN Where ixgbeN is your ixgbe interfaces (probably ixgbe0 and ixgbe1) /dale signature.asc Description: Message signed with OpenPGP using GPGMail ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Am 11.05.16 um 16:48 schrieb Dale Ghent: On May 11, 2016, at 7:36 AM, Stephan Budach wrote: Am 09.05.16 um 20:43 schrieb Dale Ghent: On May 9, 2016, at 2:04 PM, Stephan Budach wrote: Am 09.05.16 um 16:33 schrieb Dale Ghent: On May 9, 2016, at 8:24 AM, Stephan Budach wrote: Hi, I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first starts with a couple if link downs/ups on one port and finally the link on that port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel on my Cisco Nexus for this connection. I have tried swapping and interchangeing cables and thus switchports, but to no avail. Anyone else noticed this and even better… knows a solution to this? Was this an issue noticed only with r151018 and not with previous versions, or have you only tried this with 018? By your description, I presume that the two ixgbe physical links will stay at 10Gb and not bounce down to 1Gb if not LACP'd together? /dale I have noticed that on prior versions of OmniOS as well, but we only recently started deploying 10GbE LACP bonds, when we introduced our Nexus gear to our network. I will have to check if both links stay at 10GbE, when not being configured as a LACP bond. Let me check that tomorrow and report back. As we're heading for a streched DC, we are mainly configuring 2-way LACP bonds over our Nexus gear, so we don't actually have any single 10GbE connection, as they will all have to be conencted to both DCs. This is achieved by using VPCs on our Nexus switches. Provide as much detail as you can - if you're using hw flow control, whether both links act this way at the same time or independently, and so-on. Problems like this often boil down to a very small and seemingly insignificant detail. I currently have ixgbe on the operating table for adding X550 support, so I can take a look at this; however I don't have your type of switches available to me so LACP-specific testing is something I can't do for you. /dale I checked the ixgbe.conf files on each host and they all are still at the standard setting, which includes flow_control = 3; As, so you are using ethernet flow control. Could you try disabling that on both sides (on the ixgbe host and on the switch) and see if that corrects the link stability issues? There's an outstanding issue with hw flow control on ixgbe that you *might* be running into regarding pause frame timing, which could manifest in the way you describe. /dale I will try to get one node free of all services running on it, as I will have to reboot the system, since I will have to change the ixgbe.conf, haven't I? This is a RSF-1 host, so this will likely be done over the weekend. Thanks, Stephan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
> On May 11, 2016, at 7:36 AM, Stephan Budach wrote: > > Am 09.05.16 um 20:43 schrieb Dale Ghent: >>> On May 9, 2016, at 2:04 PM, Stephan Budach wrote: >>> >>> Am 09.05.16 um 16:33 schrieb Dale Ghent: > On May 9, 2016, at 8:24 AM, Stephan Budach wrote: > > Hi, > > I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break > the LACP aggr-link on different boxes, when Intel X540-T2s are involved. > It first starts with a couple if link downs/ups on one port and finally > the link on that port negiotates to 1GbE instead of 10GbE, which then > breaks the LACP channel on my Cisco Nexus for this connection. > > I have tried swapping and interchangeing cables and thus switchports, but > to no avail. > > Anyone else noticed this and even better… knows a solution to this? Was this an issue noticed only with r151018 and not with previous versions, or have you only tried this with 018? By your description, I presume that the two ixgbe physical links will stay at 10Gb and not bounce down to 1Gb if not LACP'd together? /dale >>> I have noticed that on prior versions of OmniOS as well, but we only >>> recently started deploying 10GbE LACP bonds, when we introduced our Nexus >>> gear to our network. I will have to check if both links stay at 10GbE, when >>> not being configured as a LACP bond. Let me check that tomorrow and report >>> back. As we're heading for a streched DC, we are mainly configuring 2-way >>> LACP bonds over our Nexus gear, so we don't actually have any single 10GbE >>> connection, as they will all have to be conencted to both DCs. This is >>> achieved by using VPCs on our Nexus switches. >> Provide as much detail as you can - if you're using hw flow control, whether >> both links act this way at the same time or independently, and so-on. >> Problems like this often boil down to a very small and seemingly >> insignificant detail. >> >> I currently have ixgbe on the operating table for adding X550 support, so I >> can take a look at this; however I don't have your type of switches >> available to me so LACP-specific testing is something I can't do for you. >> >> /dale > I checked the ixgbe.conf files on each host and they all are still at the > standard setting, which includes flow_control = 3; As, so you are using ethernet flow control. Could you try disabling that on both sides (on the ixgbe host and on the switch) and see if that corrects the link stability issues? There's an outstanding issue with hw flow control on ixgbe that you *might* be running into regarding pause frame timing, which could manifest in the way you describe. /dale signature.asc Description: Message signed with OpenPGP using GPGMail ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Am 11.05.16 um 14:50 schrieb Stephan Budach: Am 11.05.16 um 13:36 schrieb Stephan Budach: Am 09.05.16 um 20:43 schrieb Dale Ghent: On May 9, 2016, at 2:04 PM, Stephan Budach wrote: Am 09.05.16 um 16:33 schrieb Dale Ghent: On May 9, 2016, at 8:24 AM, Stephan Budach wrote: Hi, I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first starts with a couple if link downs/ups on one port and finally the link on that port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel on my Cisco Nexus for this connection. I have tried swapping and interchangeing cables and thus switchports, but to no avail. Anyone else noticed this and even better… knows a solution to this? Was this an issue noticed only with r151018 and not with previous versions, or have you only tried this with 018? By your description, I presume that the two ixgbe physical links will stay at 10Gb and not bounce down to 1Gb if not LACP'd together? /dale I have noticed that on prior versions of OmniOS as well, but we only recently started deploying 10GbE LACP bonds, when we introduced our Nexus gear to our network. I will have to check if both links stay at 10GbE, when not being configured as a LACP bond. Let me check that tomorrow and report back. As we're heading for a streched DC, we are mainly configuring 2-way LACP bonds over our Nexus gear, so we don't actually have any single 10GbE connection, as they will all have to be conencted to both DCs. This is achieved by using VPCs on our Nexus switches. Provide as much detail as you can - if you're using hw flow control, whether both links act this way at the same time or independently, and so-on. Problems like this often boil down to a very small and seemingly insignificant detail. I currently have ixgbe on the operating table for adding X550 support, so I can take a look at this; however I don't have your type of switches available to me so LACP-specific testing is something I can't do for you. /dale I checked the ixgbe.conf files on each host and they all are still at the standard setting, which includes flow_control = 3; So they all have flow control enabled. As for the Nexus config, all of those ports are still on standard ethernet ports and modifications have only been made globally to the switch. I will now have to yank the one port on one of the hosts from the aggr and configure it as a standalone port. Then we will see, if it still receives the disconnects/reconnects and finally the negotiation to 1GbE instead of 10GbE. As this only seems to happen to the same port I never experienced other ports of the affected aggrs acting up. I also thought to notice, that those were always the "same" physical ports, that is the first port on the card (ixgbe0), but that might of course be a coincidence. Thanks, Stephan Ok, so we can likely rule out LACP as a generic reason for this issue… After removing ixgbe0 from the aggr1, I plugged it into an unused port of my Nexus FEX and low and behold, here we go: root@tr1206902:/root# tail -f /var/adm/messages May 11 14:37:17 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link up, 1000 Mbps, full duplex May 11 14:38:35 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 link down May 11 14:38:48 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link up, 1 Mbps, full duplex May 11 15:24:55 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 link down May 11 15:25:10 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link up, 1 Mbps, full duplex So, after less than an hour, we had the first link-cycle on ixgbe0, alas on another port, which has no LACP config whatsoever. I will monitor this for a while and see, if we will get more of those. Thanks, Stephan Ehh… and sorry, I almost forgot to paste the log from the Cisco Nexus switch: 2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-SPEED: Interface Ethernet141/1/9, operational speed changed to 10 Gbps 2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-IF_DUPLEX: Interface Ethernet141/1/9, operational duplex mode changed to Full 2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface Ethernet141/1/9, operational Receive Flow Control state changed to off 2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface Ethernet141/1/9, operational Transmit Flow Control state changed to on 2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-IF_UP: Interface Ethernet141/1/9 is up in mode access 2016 May 11 14:07:29 gh79-nx-01 %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet141/1/9 is down (Link failure) 2016 May 11 14:07:45 gh79-nx-01 last message repeated 1 time 2016 May 11 14:07:45 gh79-nx-01 %ETHPORT-5-SPEED: Interface Ethernet141/1/9, operational speed changed to 10 Gbps 2016 May 11 14:07:45 gh79-nx-01 %ETHPORT-5-IF_DUPLEX: Interface Ethernet141/1/9, operational duplex mode changed to Full 2016
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Am 11.05.16 um 13:36 schrieb Stephan Budach: Am 09.05.16 um 20:43 schrieb Dale Ghent: On May 9, 2016, at 2:04 PM, Stephan Budach wrote: Am 09.05.16 um 16:33 schrieb Dale Ghent: On May 9, 2016, at 8:24 AM, Stephan Budach wrote: Hi, I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first starts with a couple if link downs/ups on one port and finally the link on that port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel on my Cisco Nexus for this connection. I have tried swapping and interchangeing cables and thus switchports, but to no avail. Anyone else noticed this and even better… knows a solution to this? Was this an issue noticed only with r151018 and not with previous versions, or have you only tried this with 018? By your description, I presume that the two ixgbe physical links will stay at 10Gb and not bounce down to 1Gb if not LACP'd together? /dale I have noticed that on prior versions of OmniOS as well, but we only recently started deploying 10GbE LACP bonds, when we introduced our Nexus gear to our network. I will have to check if both links stay at 10GbE, when not being configured as a LACP bond. Let me check that tomorrow and report back. As we're heading for a streched DC, we are mainly configuring 2-way LACP bonds over our Nexus gear, so we don't actually have any single 10GbE connection, as they will all have to be conencted to both DCs. This is achieved by using VPCs on our Nexus switches. Provide as much detail as you can - if you're using hw flow control, whether both links act this way at the same time or independently, and so-on. Problems like this often boil down to a very small and seemingly insignificant detail. I currently have ixgbe on the operating table for adding X550 support, so I can take a look at this; however I don't have your type of switches available to me so LACP-specific testing is something I can't do for you. /dale I checked the ixgbe.conf files on each host and they all are still at the standard setting, which includes flow_control = 3; So they all have flow control enabled. As for the Nexus config, all of those ports are still on standard ethernet ports and modifications have only been made globally to the switch. I will now have to yank the one port on one of the hosts from the aggr and configure it as a standalone port. Then we will see, if it still receives the disconnects/reconnects and finally the negotiation to 1GbE instead of 10GbE. As this only seems to happen to the same port I never experienced other ports of the affected aggrs acting up. I also thought to notice, that those were always the "same" physical ports, that is the first port on the card (ixgbe0), but that might of course be a coincidence. Thanks, Stephan Ok, so we can likely rule out LACP as a generic reason for this issue… After removing ixgbe0 from the aggr1, I plugged it into an unused port of my Nexus FEX and low and behold, here we go: root@tr1206902:/root# tail -f /var/adm/messages May 11 14:37:17 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link up, 1000 Mbps, full duplex May 11 14:38:35 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 link down May 11 14:38:48 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link up, 1 Mbps, full duplex May 11 15:24:55 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 link down May 11 15:25:10 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link up, 1 Mbps, full duplex So, after less than an hour, we had the first link-cycle on ixgbe0, alas on another port, which has no LACP config whatsoever. I will monitor this for a while and see, if we will get more of those. Thanks, Stephan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Am 09.05.16 um 20:43 schrieb Dale Ghent: On May 9, 2016, at 2:04 PM, Stephan Budach wrote: Am 09.05.16 um 16:33 schrieb Dale Ghent: On May 9, 2016, at 8:24 AM, Stephan Budach wrote: Hi, I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first starts with a couple if link downs/ups on one port and finally the link on that port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel on my Cisco Nexus for this connection. I have tried swapping and interchangeing cables and thus switchports, but to no avail. Anyone else noticed this and even better… knows a solution to this? Was this an issue noticed only with r151018 and not with previous versions, or have you only tried this with 018? By your description, I presume that the two ixgbe physical links will stay at 10Gb and not bounce down to 1Gb if not LACP'd together? /dale I have noticed that on prior versions of OmniOS as well, but we only recently started deploying 10GbE LACP bonds, when we introduced our Nexus gear to our network. I will have to check if both links stay at 10GbE, when not being configured as a LACP bond. Let me check that tomorrow and report back. As we're heading for a streched DC, we are mainly configuring 2-way LACP bonds over our Nexus gear, so we don't actually have any single 10GbE connection, as they will all have to be conencted to both DCs. This is achieved by using VPCs on our Nexus switches. Provide as much detail as you can - if you're using hw flow control, whether both links act this way at the same time or independently, and so-on. Problems like this often boil down to a very small and seemingly insignificant detail. I currently have ixgbe on the operating table for adding X550 support, so I can take a look at this; however I don't have your type of switches available to me so LACP-specific testing is something I can't do for you. /dale I checked the ixgbe.conf files on each host and they all are still at the standard setting, which includes flow_control = 3; So they all have flow control enabled. As for the Nexus config, all of those ports are still on standard ethernet ports and modifications have only been made globally to the switch. I will now have to yank the one port on one of the hosts from the aggr and configure it as a standalone port. Then we will see, if it still receives the disconnects/reconnects and finally the negotiation to 1GbE instead of 10GbE. As this only seems to happen to the same port I never experienced other ports of the affected aggrs acting up. I also thought to notice, that those were always the "same" physical ports, that is the first port on the card (ixgbe0), but that might of course be a coincidence. Thanks, Stephan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
> On May 9, 2016, at 2:04 PM, Stephan Budach wrote: > > Am 09.05.16 um 16:33 schrieb Dale Ghent: >>> On May 9, 2016, at 8:24 AM, Stephan Budach wrote: >>> >>> Hi, >>> >>> I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break >>> the LACP aggr-link on different boxes, when Intel X540-T2s are involved. It >>> first starts with a couple if link downs/ups on one port and finally the >>> link on that port negiotates to 1GbE instead of 10GbE, which then breaks >>> the LACP channel on my Cisco Nexus for this connection. >>> >>> I have tried swapping and interchangeing cables and thus switchports, but >>> to no avail. >>> >>> Anyone else noticed this and even better… knows a solution to this? >> Was this an issue noticed only with r151018 and not with previous versions, >> or have you only tried this with 018? >> >> By your description, I presume that the two ixgbe physical links will stay >> at 10Gb and not bounce down to 1Gb if not LACP'd together? >> >> /dale > I have noticed that on prior versions of OmniOS as well, but we only recently > started deploying 10GbE LACP bonds, when we introduced our Nexus gear to our > network. I will have to check if both links stay at 10GbE, when not being > configured as a LACP bond. Let me check that tomorrow and report back. As > we're heading for a streched DC, we are mainly configuring 2-way LACP bonds > over our Nexus gear, so we don't actually have any single 10GbE connection, > as they will all have to be conencted to both DCs. This is achieved by using > VPCs on our Nexus switches. Provide as much detail as you can - if you're using hw flow control, whether both links act this way at the same time or independently, and so-on. Problems like this often boil down to a very small and seemingly insignificant detail. I currently have ixgbe on the operating table for adding X550 support, so I can take a look at this; however I don't have your type of switches available to me so LACP-specific testing is something I can't do for you. /dale signature.asc Description: Message signed with OpenPGP using GPGMail ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Am 09.05.16 um 16:33 schrieb Dale Ghent: On May 9, 2016, at 8:24 AM, Stephan Budach wrote: Hi, I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first starts with a couple if link downs/ups on one port and finally the link on that port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel on my Cisco Nexus for this connection. I have tried swapping and interchangeing cables and thus switchports, but to no avail. Anyone else noticed this and even better… knows a solution to this? Was this an issue noticed only with r151018 and not with previous versions, or have you only tried this with 018? By your description, I presume that the two ixgbe physical links will stay at 10Gb and not bounce down to 1Gb if not LACP'd together? /dale I have noticed that on prior versions of OmniOS as well, but we only recently started deploying 10GbE LACP bonds, when we introduced our Nexus gear to our network. I will have to check if both links stay at 10GbE, when not being configured as a LACP bond. Let me check that tomorrow and report back. As we're heading for a streched DC, we are mainly configuring 2-way LACP bonds over our Nexus gear, so we don't actually have any single 10GbE connection, as they will all have to be conencted to both DCs. This is achieved by using VPCs on our Nexus switches. Thanks, Stephan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
> On May 9, 2016, at 8:24 AM, Stephan Budach wrote: > > Hi, > > I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the > LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first > starts with a couple if link downs/ups on one port and finally the link on > that port negiotates to 1GbE instead of 10GbE, which then breaks the LACP > channel on my Cisco Nexus for this connection. > > I have tried swapping and interchangeing cables and thus switchports, but to > no avail. > > Anyone else noticed this and even better… knows a solution to this? Was this an issue noticed only with r151018 and not with previous versions, or have you only tried this with 018? By your description, I presume that the two ixgbe physical links will stay at 10Gb and not bounce down to 1Gb if not LACP'd together? /dale signature.asc Description: Message signed with OpenPGP using GPGMail ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
[OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2
Hi, I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first starts with a couple if link downs/ups on one port and finally the link on that port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel on my Cisco Nexus for this connection. I have tried swapping and interchangeing cables and thus switchports, but to no avail. Anyone else noticed this and even better… knows a solution to this? Cheers, Stephan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss