Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-20 Thread Stephan Budach

Hi,

after having messed around with quite a lot of Cat6a cables from 
different vendors, it seems that the 2m runs are too short. Maybe the 
Intel controllers are providing too much power to the jacks. I have now 
exchanged the 2m/3m cables with 5m ones and this seems to do the trick.


I know, that there will be some crosstalk at the cables' either ends and 
I learned that even sophisticated measurement equipment will discard the 
results for the first 2m of a Cat6a 10GbE connection. However, having to 
deal with 5m runs is anything but convinient, if you do have quite a 
number of cables to deploy.


I am still monitoring the NICs, though… ;)

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-19 Thread Dale Ghent

> On Jan 19, 2017, at 4:27 AM, Stephan Budach  wrote:
> 
> Am 18.01.17 um 17:38 schrieb Stephan Budach:
>> Am 18.01.17 um 17:32 schrieb Dan McDonald:
>>> Generally the X540 has had a good track record.  I brought up the support 
>>> for this a long time ago, and it worked alright then.  I think Dale has an 
>>> X540 in-house which works fine too (he should confirm this).
>>> 
>>> Some other things to check:
>>> 
>>> * Is your BIOS set to map the PCI-E space into the low-32 bits only?  
>>> That's an illumos limitation.
>>> 
>>> * Do you have other known-working 10GigBaseT chips to try?
>>> 
>>> Dan
>>> 
>>> 
>> I will check with the BIOS, altough I thought that this option would simply 
>> cause PCI adaptors to vanish from the system, if setup that way.
>> Actually, I have been going with Intel all the time and it has been up to 
>> the X540 in 10GbE setups only, when I ever startet to experience issues at 
>> all, so Intel has been a natural choice for me ever… ;)
>> 
>> Stephan
> I just checked the BIOS of my new Supermicros and I think that this is the 
> BIOS option you were referring to…
> 
> Above 4G Decoding: DISABLED
> 
> So, this should be right, shouldn't it?

Correct, currently it should be disabled (however we hope by the time 022 is 
released, there will not be a reason to have that disabled)

/dale
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-19 Thread Stephan Budach

Am 18.01.17 um 17:38 schrieb Stephan Budach:

Am 18.01.17 um 17:32 schrieb Dan McDonald:

Generally the X540 has had a good track record.  I brought up the support for 
this a long time ago, and it worked alright then.  I think Dale has an X540 
in-house which works fine too (he should confirm this).

Some other things to check:

* Is your BIOS set to map the PCI-E space into the low-32 bits only?  That's an 
illumos limitation.

* Do you have other known-working 10GigBaseT chips to try?

Dan

I will check with the BIOS, altough I thought that this option would 
simply cause PCI adaptors to vanish from the system, if setup that way.
Actually, I have been going with Intel all the time and it has been up 
to the X540 in 10GbE setups only, when I ever startet to experience 
issues at all, so Intel has been a natural choice for me ever… ;)


Stephan
I just checked the BIOS of my new Supermicros and I think that this is 
the BIOS option you were referring to…


Above 4G Decoding: DISABLED

So, this should be right, shouldn't it?

Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-18 Thread Stephan Budach

Am 18.01.17 um 17:32 schrieb Dan McDonald:

Generally the X540 has had a good track record.  I brought up the support for 
this a long time ago, and it worked alright then.  I think Dale has an X540 
in-house which works fine too (he should confirm this).

Some other things to check:

* Is your BIOS set to map the PCI-E space into the low-32 bits only?  That's an 
illumos limitation.

* Do you have other known-working 10GigBaseT chips to try?

Dan

I will check with the BIOS, altough I thought that this option would 
simply cause PCI adaptors to vanish from the system, if setup that way.
Actually, I have been going with Intel all the time and it has been up 
to the X540 in 10GbE setups only, when I ever startet to experience 
issues at all, so Intel has been a natural choice for me ever… ;)


Stephan



smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-18 Thread Dan McDonald
Generally the X540 has had a good track record.  I brought up the support for 
this a long time ago, and it worked alright then.  I think Dale has an X540 
in-house which works fine too (he should confirm this).

Some other things to check:

* Is your BIOS set to map the PCI-E space into the low-32 bits only?  That's an 
illumos limitation.

* Do you have other known-working 10GigBaseT chips to try?

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-18 Thread Stephan Budach

Am 18.01.17 um 09:01 schrieb Dale Ghent:

On Jan 18, 2017, at 2:38 AM, Stephan Budach  wrote:

Am 17.01.17 um 23:09 schrieb Dale Ghent:

On Jan 17, 2017, at 2:39 PM, Stephan Budach 
  wrote:

Am 17.01.17 um 17:37 schrieb Dale Ghent:


On Jan 17, 2017, at 11:31 AM, Stephan Budach 

  wrote:

Hi Dale,

Am 17.01.17 um 17:22 schrieb Dale Ghent:



On Jan 17, 2017, at 11:12 AM, Stephan Budach 


  wrote:

Hi guys,

I am sorry, but I do have to undig this old topic, since I do now have three 
hosts running omniOS 018/020, which show these pesky  issues with flapping 
their ixgbeN links on my Nexus FEXes…

Does anyone know, if there has any change been made to the ixgbe drivers since 
06/2016?




Since June 2016? Yes! A large update to the ixgbe driver happened in August. 
This added X550 support, and also brought the Intel Shared Code it uses from 
its 2012 vintage up to current. The updated driver is available in 014 and 
later.

/dale




do you know of any option to get to know, why three of my boxes are flapping 
their 10GbE ports? It's actually not only when in aggr mode, but on single use 
as well. Last week I presumeably had one of my RSF-1 nodes panic, since it 
couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere doen the 
line, the ixgbe driver seems to be fine, to configure one port to 1GbE instead 
of 10GbE, which will stop the flapping, but wich will break the VPC on my Nexus 
nevertheless.

In syslog, this looks like this:

...
Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 
1000 Mbps, full duplex
Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down

Note on 14:46:07, where the system settles on a 1GbE connection…



Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? 
It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to 
me. The driver will always try to link at the fastest speed that the local 
controller and the remote peer will negotiate at... it will not proactively 
downgrade the link speed. If that happens, it is because that is what the 
controller managed to negotiate with the remote peer at.

Are you using jumbo frames or anything outside of a normal 1500mtu link?

/dale



The cables are actually specifically purchased cat6 cables. They run about 2m, 
not more. It could be tna cables, but I am running a couple of those and afaik, 
I only get these issues on these three nodes. I can try some other cables, but 
I hoped to be able to get maybe some kind of debug messages from the driver.


The chip provides no reason for a LoS or downgrade of the link. For configuration issues 
it interrupts only on a few things. "LSC" (Link Status Change) interrupts one 
of these things and are what tells the driver to interrogate the chip for its current 
speed, but beyond that, the hardware provides no further details. Any details regarding 
why the PHY had to re-train the link are completely hidden to the driver.

Are these X540 interfaces actually built into the motherboard, or are they separate PCIe 
cards? Also, CAT6 alone might not be enough, and even the magnetics on the older X540 
might not even be able to eek out a 10Gb connection, even at 2m. I would remove all doubt 
of cabling being an issue by replacing them with CAT6a. Beware of cable vendors who sell 
CAT6 cables as "CAT6a". It could also be an issue with the modular jacks on the 
ends.

Since you mentioned "after 6/2016" for the ixgbe driver, have you tried the 
newer one yet? Large portions of it were re-written and re-factored, and many bugs fixed 
including portions that touch the X540 due to the new X550 also being copper and the two 
models needing to share some logic related to that.

/dale


Thanks for clarifying that. I just checked the cables and they classify as 
Cat6a and they are from a respectable german vendor, n

Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-18 Thread Dominik Hassler

Thanks for clarifying that. I just checked the cables and they classify
as Cat6a and they are from a respectable german vendor, not that this
would be any guarantee, but at least they're no bulkware from china. ;)

The X540s are either onboard on some Supermicro X10 boards, but also on
a genuine Intel PCI adaptor. I will check some other cables, maybe the
ones I got were somewhat faulty. However, this leaves only a few
options  to the user, finding out, what is actually wrong with the
connection, isn't it?

Regarding the release of omniOS, I will update my RSF-1 node to the
latest r18, the other two new nodes are actually on r20 and thus should
already have the new driver installed.

…any suggestion on some good cables? ;)


Dätwyler:
http://www.cabling.datwyler.com/products/data-centres/copper-technology/patch-cords/product/rj45-patch-cords-cat6a-iec.html

they provide detailed specs and test certificates for all of their cable 
types.




Thanks,
Stephan




___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-18 Thread Dale Ghent

> On Jan 18, 2017, at 2:38 AM, Stephan Budach  wrote:
> 
> Am 17.01.17 um 23:09 schrieb Dale Ghent:
>>> On Jan 17, 2017, at 2:39 PM, Stephan Budach 
>>>  wrote:
>>> 
>>> Am 17.01.17 um 17:37 schrieb Dale Ghent:
>>> 
> On Jan 17, 2017, at 11:31 AM, Stephan Budach 
> 
>  wrote:
> 
> Hi Dale,
> 
> Am 17.01.17 um 17:22 schrieb Dale Ghent:
> 
> 
>>> On Jan 17, 2017, at 11:12 AM, Stephan Budach 
>>> 
>>> 
>>>  wrote:
>>> 
>>> Hi guys,
>>> 
>>> I am sorry, but I do have to undig this old topic, since I do now have 
>>> three hosts running omniOS 018/020, which show these pesky  issues with 
>>> flapping their ixgbeN links on my Nexus FEXes…
>>> 
>>> Does anyone know, if there has any change been made to the ixgbe 
>>> drivers since 06/2016?
>>> 
>>> 
>>> 
>> Since June 2016? Yes! A large update to the ixgbe driver happened in 
>> August. This added X550 support, and also brought the Intel Shared Code 
>> it uses from its 2012 vintage up to current. The updated driver is 
>> available in 014 and later.
>> 
>> /dale
>> 
>> 
>> 
> do you know of any option to get to know, why three of my boxes are 
> flapping their 10GbE ports? It's actually not only when in aggr mode, but 
> on single use as well. Last week I presumeably had one of my RSF-1 nodes 
> panic, since it couldn't get to it's iSCSI LUNs anymore. The thing ist, 
> that somewhere doen the line, the ixgbe driver seems to be fine, to 
> configure one port to 1GbE instead of 10GbE, which will stop the 
> flapping, but wich will break the VPC on my Nexus nevertheless.
> 
> In syslog, this looks like this:
> 
> ...
> Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
> link up, 1000 Mbps, full duplex
> Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
> link down
> Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
> link up, 1 Mbps, full duplex
> Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
> link down
> Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
> link up, 1 Mbps, full duplex
> Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
> link down
> Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
> link up, 1 Mbps, full duplex
> Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
> link down
> Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
> link up, 1 Mbps, full duplex
> Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
> link down
> Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
> link up, 1 Mbps, full duplex
> Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
> link down
> Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
> link up, 1 Mbps, full duplex
> Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
> link down
> 
> Note on 14:46:07, where the system settles on a 1GbE connection…
> 
> 
 Sounds like a cabling issue? Are the runs too long or are you not using 
 CAT6a? It flapping at 10Gb and then settling at 1Gb would indicate a 
 cabling issue to me. The driver will always try to link at the fastest 
 speed that the local controller and the remote peer will negotiate at... 
 it will not proactively downgrade the link speed. If that happens, it is 
 because that is what the controller managed to negotiate with the remote 
 peer at.
 
 Are you using jumbo frames or anything outside of a normal 1500mtu link?
 
 /dale
 
 
>>> The cables are actually specifically purchased cat6 cables. They run about 
>>> 2m, not more. It could be tna cables, but I am running a couple of those 
>>> and afaik, I only get these issues on these three nodes. I can try some 
>>> other cables, but I hoped to be able to get maybe some kind of debug 
>>> messages from the driver.
>>> 
>> The chip provides no reason for a LoS or downgrade of the link. For 
>> configuration issues it interrupts only on a few things. "LSC" (Link Status 
>> Change) interrupts one of these things and are what tells the driver to 
>> interrogate the chip for its current speed, but beyond that, the hardware 
>> provides no further details. Any details regarding why the PHY had to 
>> re-train the link are completely hidden to the driver.
>> 
>> Are these X540 interfaces actually built into the motherboard, or are they 
>> separate PCIe cards? Also, CAT6 alone might not be enough, and even the 
>> magnetics on the older X540 might not even be able to eek out a 10Gb 
>> connection, even at 2m. I would remove all doubt of cabling being a

Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Stephan Budach

Am 17.01.17 um 23:09 schrieb Dale Ghent:

On Jan 17, 2017, at 2:39 PM, Stephan Budach  wrote:

Am 17.01.17 um 17:37 schrieb Dale Ghent:

On Jan 17, 2017, at 11:31 AM, Stephan Budach 
  wrote:

Hi Dale,

Am 17.01.17 um 17:22 schrieb Dale Ghent:


On Jan 17, 2017, at 11:12 AM, Stephan Budach 

  wrote:

Hi guys,

I am sorry, but I do have to undig this old topic, since I do now have three 
hosts running omniOS 018/020, which show these pesky  issues with flapping 
their ixgbeN links on my Nexus FEXes…

Does anyone know, if there has any change been made to the ixgbe drivers since 
06/2016?



Since June 2016? Yes! A large update to the ixgbe driver happened in August. 
This added X550 support, and also brought the Intel Shared Code it uses from 
its 2012 vintage up to current. The updated driver is available in 014 and 
later.

/dale



do you know of any option to get to know, why three of my boxes are flapping 
their 10GbE ports? It's actually not only when in aggr mode, but on single use 
as well. Last week I presumeably had one of my RSF-1 nodes panic, since it 
couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere doen the 
line, the ixgbe driver seems to be fine, to configure one port to 1GbE instead 
of 10GbE, which will stop the flapping, but wich will break the VPC on my Nexus 
nevertheless.

In syslog, this looks like this:

...
Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 
1000 Mbps, full duplex
Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down

Note on 14:46:07, where the system settles on a 1GbE connection…


Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? 
It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to 
me. The driver will always try to link at the fastest speed that the local 
controller and the remote peer will negotiate at... it will not proactively 
downgrade the link speed. If that happens, it is because that is what the 
controller managed to negotiate with the remote peer at.

Are you using jumbo frames or anything outside of a normal 1500mtu link?

/dale


The cables are actually specifically purchased cat6 cables. They run about 2m, 
not more. It could be tna cables, but I am running a couple of those and afaik, 
I only get these issues on these three nodes. I can try some other cables, but 
I hoped to be able to get maybe some kind of debug messages from the driver.

The chip provides no reason for a LoS or downgrade of the link. For configuration issues 
it interrupts only on a few things. "LSC" (Link Status Change) interrupts one 
of these things and are what tells the driver to interrogate the chip for its current 
speed, but beyond that, the hardware provides no further details. Any details regarding 
why the PHY had to re-train the link are completely hidden to the driver.

Are these X540 interfaces actually built into the motherboard, or are they separate PCIe 
cards? Also, CAT6 alone might not be enough, and even the magnetics on the older X540 
might not even be able to eek out a 10Gb connection, even at 2m. I would remove all doubt 
of cabling being an issue by replacing them with CAT6a. Beware of cable vendors who sell 
CAT6 cables as "CAT6a". It could also be an issue with the modular jacks on the 
ends.

Since you mentioned "after 6/2016" for the ixgbe driver, have you tried the 
newer one yet? Large portions of it were re-written and re-factored, and many bugs fixed 
including portions that touch the X540 due to the new X550 also being copper and the two 
models needing to share some logic related to that.

/dale
Thanks for clarifying that. I just checked the cables and they classify 
as Cat6a and they are from a respectable german vendor, not that this 
would be any guarantee, but at least they're no bulkware from china. ;)


The X540s are either

Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Dale Ghent

> On Jan 17, 2017, at 2:39 PM, Stephan Budach  wrote:
> 
> Am 17.01.17 um 17:37 schrieb Dale Ghent:
>>> On Jan 17, 2017, at 11:31 AM, Stephan Budach 
>>>  wrote:
>>> 
>>> Hi Dale,
>>> 
>>> Am 17.01.17 um 17:22 schrieb Dale Ghent:
>>> 
> On Jan 17, 2017, at 11:12 AM, Stephan Budach 
> 
>  wrote:
> 
> Hi guys,
> 
> I am sorry, but I do have to undig this old topic, since I do now have 
> three hosts running omniOS 018/020, which show these pesky  issues with 
> flapping their ixgbeN links on my Nexus FEXes…
> 
> Does anyone know, if there has any change been made to the ixgbe drivers 
> since 06/2016?
> 
> 
 Since June 2016? Yes! A large update to the ixgbe driver happened in 
 August. This added X550 support, and also brought the Intel Shared Code it 
 uses from its 2012 vintage up to current. The updated driver is available 
 in 014 and later.
 
 /dale
 
 
>>> do you know of any option to get to know, why three of my boxes are 
>>> flapping their 10GbE ports? It's actually not only when in aggr mode, but 
>>> on single use as well. Last week I presumeably had one of my RSF-1 nodes 
>>> panic, since it couldn't get to it's iSCSI LUNs anymore. The thing ist, 
>>> that somewhere doen the line, the ixgbe driver seems to be fine, to 
>>> configure one port to 1GbE instead of 10GbE, which will stop the flapping, 
>>> but wich will break the VPC on my Nexus nevertheless.
>>> 
>>> In syslog, this looks like this:
>>> 
>>> ...
>>> Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link 
>>> up, 1000 Mbps, full duplex
>>> Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
>>> down
>>> Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link 
>>> up, 1 Mbps, full duplex
>>> Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
>>> down
>>> Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link 
>>> up, 1 Mbps, full duplex
>>> Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
>>> down
>>> Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link 
>>> up, 1 Mbps, full duplex
>>> Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
>>> down
>>> Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link 
>>> up, 1 Mbps, full duplex
>>> Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
>>> down
>>> Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link 
>>> up, 1 Mbps, full duplex
>>> Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
>>> down
>>> Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link 
>>> up, 1 Mbps, full duplex
>>> Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
>>> down
>>> 
>>> Note on 14:46:07, where the system settles on a 1GbE connection…
>>> 
>> Sounds like a cabling issue? Are the runs too long or are you not using 
>> CAT6a? It flapping at 10Gb and then settling at 1Gb would indicate a cabling 
>> issue to me. The driver will always try to link at the fastest speed that 
>> the local controller and the remote peer will negotiate at... it will not 
>> proactively downgrade the link speed. If that happens, it is because that is 
>> what the controller managed to negotiate with the remote peer at.
>> 
>> Are you using jumbo frames or anything outside of a normal 1500mtu link?
>> 
>> /dale
>> 
> The cables are actually specifically purchased cat6 cables. They run about 
> 2m, not more. It could be tna cables, but I am running a couple of those and 
> afaik, I only get these issues on these three nodes. I can try some other 
> cables, but I hoped to be able to get maybe some kind of debug messages from 
> the driver.

The chip provides no reason for a LoS or downgrade of the link. For 
configuration issues it interrupts only on a few things. "LSC" (Link Status 
Change) interrupts one of these things and are what tells the driver to 
interrogate the chip for its current speed, but beyond that, the hardware 
provides no further details. Any details regarding why the PHY had to re-train 
the link are completely hidden to the driver.

Are these X540 interfaces actually built into the motherboard, or are they 
separate PCIe cards? Also, CAT6 alone might not be enough, and even the 
magnetics on the older X540 might not even be able to eek out a 10Gb 
connection, even at 2m. I would remove all doubt of cabling being an issue by 
replacing them with CAT6a. Beware of cable vendors who sell CAT6 cables as 
"CAT6a". It could also be an issue with the modular jacks on the ends.

Since you mentioned "after 6/2016" for the ixgbe driver, have you tried the 
newer one yet? Large portions of it were re-written and re-factored, and many 
bugs fixed including portions that touch the X540 due to the new X550 

Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Michael Rasmussen
On Tue, 17 Jan 2017 20:39:49 +0100
Stephan Budach  wrote:

> The cables are actually specifically purchased cat6 cables. They run about 
> 2m, not more. It could be tna cables, but I am running a couple of those and 
> afaik, I only get these issues on these three nodes. I can try some other 
> cables, but I hoped to be able to get maybe some kind of debug messages from 
> the driver.
> 
Should 10Gb not use cat 6a?

BTW. Have you tried various settings for buffers and hardware offload?

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael  rasmussen  cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir  datanom  net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir  miras  org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--
/usr/games/fortune -es says:
The early bird gets the coffee left over from the night before.


pgpqmQvoY6Gek.pgp
Description: OpenPGP digital signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Stephan Budach

Am 17.01.17 um 17:37 schrieb Dale Ghent:

On Jan 17, 2017, at 11:31 AM, Stephan Budach  wrote:

Hi Dale,

Am 17.01.17 um 17:22 schrieb Dale Ghent:

On Jan 17, 2017, at 11:12 AM, Stephan Budach 
  wrote:

Hi guys,

I am sorry, but I do have to undig this old topic, since I do now have three 
hosts running omniOS 018/020, which show these pesky  issues with flapping 
their ixgbeN links on my Nexus FEXes…

Does anyone know, if there has any change been made to the ixgbe drivers since 
06/2016?


Since June 2016? Yes! A large update to the ixgbe driver happened in August. 
This added X550 support, and also brought the Intel Shared Code it uses from 
its 2012 vintage up to current. The updated driver is available in 014 and 
later.

/dale


do you know of any option to get to know, why three of my boxes are flapping 
their 10GbE ports? It's actually not only when in aggr mode, but on single use 
as well. Last week I presumeably had one of my RSF-1 nodes panic, since it 
couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere doen the 
line, the ixgbe driver seems to be fine, to configure one port to 1GbE instead 
of 10GbE, which will stop the flapping, but wich will break the VPC on my Nexus 
nevertheless.

In syslog, this looks like this:

...
Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 
1000 Mbps, full duplex
Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down

Note on 14:46:07, where the system settles on a 1GbE connection…

Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? 
It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to 
me. The driver will always try to link at the fastest speed that the local 
controller and the remote peer will negotiate at... it will not proactively 
downgrade the link speed. If that happens, it is because that is what the 
controller managed to negotiate with the remote peer at.

Are you using jumbo frames or anything outside of a normal 1500mtu link?

/dale
The cables are actually specifically purchased cat6 cables. They run 
about 2m, not more. It could be tna cables, but I am running a couple of 
those and afaik, I only get these issues on these three nodes. I can try 
some other cables, but I hoped to be able to get maybe some kind of 
debug messages from the driver.


Thanks,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Dale Ghent

> On Jan 17, 2017, at 11:31 AM, Stephan Budach  wrote:
> 
> Hi Dale,
> 
> Am 17.01.17 um 17:22 schrieb Dale Ghent:
>>> On Jan 17, 2017, at 11:12 AM, Stephan Budach 
>>>  wrote:
>>> 
>>> Hi guys,
>>> 
>>> I am sorry, but I do have to undig this old topic, since I do now have 
>>> three hosts running omniOS 018/020, which show these pesky  issues with 
>>> flapping their ixgbeN links on my Nexus FEXes…
>>> 
>>> Does anyone know, if there has any change been made to the ixgbe drivers 
>>> since 06/2016?
>>> 
>> Since June 2016? Yes! A large update to the ixgbe driver happened in August. 
>> This added X550 support, and also brought the Intel Shared Code it uses from 
>> its 2012 vintage up to current. The updated driver is available in 014 and 
>> later.
>> 
>> /dale
>> 
> 
> do you know of any option to get to know, why three of my boxes are flapping 
> their 10GbE ports? It's actually not only when in aggr mode, but on single 
> use as well. Last week I presumeably had one of my RSF-1 nodes panic, since 
> it couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere 
> doen the line, the ixgbe driver seems to be fine, to configure one port to 
> 1GbE instead of 10GbE, which will stop the flapping, but wich will break the 
> VPC on my Nexus nevertheless.
> 
> In syslog, this looks like this:
> 
> ...
> Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link 
> up, 1000 Mbps, full duplex
> Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
> down
> Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link 
> up, 1 Mbps, full duplex
> Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
> down
> Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link 
> up, 1 Mbps, full duplex
> Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
> down
> Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link 
> up, 1 Mbps, full duplex
> Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
> down
> Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link 
> up, 1 Mbps, full duplex
> Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
> down
> Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link 
> up, 1 Mbps, full duplex
> Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
> down
> Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link 
> up, 1 Mbps, full duplex
> Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link 
> down
> 
> Note on 14:46:07, where the system settles on a 1GbE connection…

Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? 
It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to 
me. The driver will always try to link at the fastest speed that the local 
controller and the remote peer will negotiate at... it will not proactively 
downgrade the link speed. If that happens, it is because that is what the 
controller managed to negotiate with the remote peer at.

Are you using jumbo frames or anything outside of a normal 1500mtu link?

/dale
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Dan McDonald
I'd check your switch, though you're using 10GigBaseT, which shouldn't be as 
big of a problem.  Hmmm, using cat6 or better cables?  5e isn't going to cut it 
for reliable 10Gig service.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Stephan Budach

Hi Dale,

Am 17.01.17 um 17:22 schrieb Dale Ghent:

On Jan 17, 2017, at 11:12 AM, Stephan Budach  wrote:

Hi guys,

I am sorry, but I do have to undig this old topic, since I do now have three 
hosts running omniOS 018/020, which show these pesky  issues with flapping 
their ixgbeN links on my Nexus FEXes…

Does anyone know, if there has any change been made to the ixgbe drivers since 
06/2016?

Since June 2016? Yes! A large update to the ixgbe driver happened in August. 
This added X550 support, and also brought the Intel Shared Code it uses from 
its 2012 vintage up to current. The updated driver is available in 014 and 
later.

/dale


do you know of any option to get to know, why three of my boxes are 
flapping their 10GbE ports? It's actually not only when in aggr mode, 
but on single use as well. Last week I presumeably had one of my RSF-1 
nodes panic, since it couldn't get to it's iSCSI LUNs anymore. The thing 
ist, that somewhere doen the line, the ixgbe driver seems to be fine, to 
configure one port to 1GbE instead of 10GbE, which will stop the 
flapping, but wich will break the VPC on my Nexus nevertheless.


In syslog, this looks like this:

Jan 17 14:41:51 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:42:11 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:43:33 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:43:33 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1 Mbps, full duplex
Jan 17 14:43:34 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:43:43 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1 Mbps, full duplex
Jan 17 14:44:05 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:44:10 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1 Mbps, full duplex
Jan 17 14:45:14 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:45:14 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1 Mbps, full duplex
Jan 17 14:45:14 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:45:29 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1 Mbps, full duplex
Jan 17 14:45:29 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:45:29 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1 Mbps, full duplex
Jan 17 14:45:29 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:45:40 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:45:45 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:45:51 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:45:51 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:45:52 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:45:56 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1000 Mbps, full duplex
Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down


Note on 14:46:07, where the system settles on a 1GbE connection…

Thanks,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Dan McDonald

> On Jan 17, 2017, at 11:12 AM, Stephan Budach  wrote:
> 
> Hi guys,
> 
> I am sorry, but I do have to undig this old topic, since I do now have three 
> hosts running omniOS 018/020, which show these pesky  issues with flapping 
> their ixgbeN links on my Nexus FEXes…

X550 support went in June of 2016.  Dale knows more about ixgbe than I do, 
since he did the work for X550.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Dale Ghent

> On Jan 17, 2017, at 11:12 AM, Stephan Budach  wrote:
> 
> Hi guys,
> 
> I am sorry, but I do have to undig this old topic, since I do now have three 
> hosts running omniOS 018/020, which show these pesky  issues with flapping 
> their ixgbeN links on my Nexus FEXes…
> 
> Does anyone know, if there has any change been made to the ixgbe drivers 
> since 06/2016?

Since June 2016? Yes! A large update to the ixgbe driver happened in August. 
This added X550 support, and also brought the Intel Shared Code it uses from 
its 2012 vintage up to current. The updated driver is available in 014 and 
later.

/dale
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Stephan Budach

Hi guys,

I am sorry, but I do have to undig this old topic, since I do now have 
three hosts running omniOS 018/020, which show these pesky issues with 
flapping their ixgbeN links on my Nexus FEXes…


Does anyone know, if there has any change been made to the ixgbe drivers 
since 06/2016?


Thanks,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-06-03 Thread Stephan Budach

Am 03.06.16 um 15:42 schrieb Fábio Rabelo:

Hi to all

A question:

This are the board you used ?

https://www.supermicro.com/products/motherboard/Xeon/C600/X10DRi-T4_.cfm

If so, this board uses Intel X540, and this issue are only with Intel
X550 chips !


Fábio Rabelo

Yes, this is the board I got. Actually, it's a  X10DRi-T4+

Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-06-03 Thread Fábio Rabelo
Hi to all

A question:

This are the board you used ?

https://www.supermicro.com/products/motherboard/Xeon/C600/X10DRi-T4_.cfm

If so, this board uses Intel X540, and this issue are only with Intel
X550 chips !


Fábio Rabelo

2016-06-03 10:20 GMT-03:00 Stephan Budach :
> Hi Dale,
>
> Am 17.05.16 um 20:55 schrieb Dale Ghent:
>
> On May 17, 2016, at 8:30 AM, Stephan Budach  wrote:
>
> I have checked all of my ixgbe interfaces and they all report that now flow
> controll is in place, as you can see:
>
> root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe0
> LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
> ixgbe0   flowctrlrw   no no no,tx,rx,bi
> root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe1
> LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
> ixgbe1   flowctrlrw   no no no,tx,rx,bi
> root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe2
> LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
> ixgbe2   flowctrlrw   no no no,tx,rx,bi
> root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe3
> LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
> ixgbe3   flowctrlrw   no no no,tx,rx,bi
>
> I then checked the ports on the Nexus switches and found out, that they do
> have outbound-flowcontrol enabled, but that is the case on any of those
> Nexus ports, including those, where this issue doesn't exist.
>
> Optimally you would have flow control turned off on both sides, as the
> switch still expects the ixgbe NIC to respond appropriately. To be honest,
> the only time to use ethernet flow control is if you are operating the
> interfaces for higher-level protocols which do not provide any sort of
> direct flow control themselves, such as FCoE. If the vast majority of
> traffic is TCP, leave it to the TCP stack to manage any local congestion on
> the link.
>
> /dale
>
> I just wanted to wrap this up… I recently swapped that old Sun server with a
> new Supermicro X10-type, which has 4 10 GbE NICs on board, installed OmniOS
> r018 and my RSF-1 cluster software on it. Configured my two LACP
> aggregations and there hasn't been any issue since.
> So, either it's something on the old server - it's a Sun Fire X4170M2 - or
> something on the Intel cards.
>
> Cheers,
> Stephan
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-06-03 Thread Stephan Budach

Hi Dale,

Am 17.05.16 um 20:55 schrieb Dale Ghent:

On May 17, 2016, at 8:30 AM, Stephan Budach  wrote:


I have checked all of my ixgbe interfaces and they all report that now flow 
controll is in place, as you can see:

root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe0
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe0   flowctrlrw   no no no,tx,rx,bi
root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe1
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe1   flowctrlrw   no no no,tx,rx,bi
root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe2
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe2   flowctrlrw   no no no,tx,rx,bi
root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe3
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe3   flowctrlrw   no no no,tx,rx,bi

I then checked the ports on the Nexus switches and found out, that they do have 
outbound-flowcontrol enabled, but that is the case on any of those Nexus ports, 
including those, where this issue doesn't exist.

Optimally you would have flow control turned off on both sides, as the switch 
still expects the ixgbe NIC to respond appropriately. To be honest, the only 
time to use ethernet flow control is if you are operating the interfaces for 
higher-level protocols which do not provide any sort of direct flow control 
themselves, such as FCoE. If the vast majority of traffic is TCP, leave it to 
the TCP stack to manage any local congestion on the link.

/dale
I just wanted to wrap this up… I recently swapped that old Sun server 
with a new Supermicro X10-type, which has 4 10 GbE NICs on board, 
installed OmniOS r018 and my RSF-1 cluster software on it. Configured my 
two LACP aggregations and there hasn't been any issue since.
So, either it's something on the old server - it's a Sun Fire X4170M2 - 
or something on the Intel cards.


Cheers,
Stephan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-17 Thread Dale Ghent
On May 17, 2016, at 8:30 AM, Stephan Budach  wrote:

> I have checked all of my ixgbe interfaces and they all report that now flow 
> controll is in place, as you can see:
> 
> root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe0
> LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
> ixgbe0   flowctrlrw   no no no,tx,rx,bi
> root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe1
> LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
> ixgbe1   flowctrlrw   no no no,tx,rx,bi
> root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe2
> LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
> ixgbe2   flowctrlrw   no no no,tx,rx,bi
> root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe3
> LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
> ixgbe3   flowctrlrw   no no no,tx,rx,bi
> 
> I then checked the ports on the Nexus switches and found out, that they do 
> have outbound-flowcontrol enabled, but that is the case on any of those Nexus 
> ports, including those, where this issue doesn't exist.

Optimally you would have flow control turned off on both sides, as the switch 
still expects the ixgbe NIC to respond appropriately. To be honest, the only 
time to use ethernet flow control is if you are operating the interfaces for 
higher-level protocols which do not provide any sort of direct flow control 
themselves, such as FCoE. If the vast majority of traffic is TCP, leave it to 
the TCP stack to manage any local congestion on the link.

/dale


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-17 Thread Stephan Budach

Am 11.05.16 um 19:28 schrieb Dale Ghent:

On May 11, 2016, at 12:32 PM, Stephan Budach  wrote:
I will try to get one node free of all services running on it, as I will have 
to reboot the system, since I will have to change the ixgbe.conf, haven't I?
This is a RSF-1 host, so this will likely be done over the weekend.

You can use dladm on a live system:

dladm set-linkprop -p flowctrl=no ixgbeN

Where ixgbeN is your ixgbe interfaces (probably ixgbe0 and ixgbe1)

/dale

I have checked all of my ixgbe interfaces and they all report that now 
flow controll is in place, as you can see:


root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe0
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe0   flowctrlrw   no no no,tx,rx,bi
root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe1
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe1   flowctrlrw   no no no,tx,rx,bi
root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe2
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe2   flowctrlrw   no no no,tx,rx,bi
root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe3
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe3   flowctrlrw   no no no,tx,rx,bi


I then checked the ports on the Nexus switches and found out, that they 
do have outbound-flowcontrol enabled, but that is the case on any of 
those Nexus ports, including those, where this issue doesn't exist.


Regards,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-11 Thread Dale Ghent

> On May 11, 2016, at 12:32 PM, Stephan Budach  wrote:
> I will try to get one node free of all services running on it, as I will have 
> to reboot the system, since I will have to change the ixgbe.conf, haven't I?
> This is a RSF-1 host, so this will likely be done over the weekend.

You can use dladm on a live system:

dladm set-linkprop -p flowctrl=no ixgbeN

Where ixgbeN is your ixgbe interfaces (probably ixgbe0 and ixgbe1)

/dale



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-11 Thread Stephan Budach

Am 11.05.16 um 16:48 schrieb Dale Ghent:

On May 11, 2016, at 7:36 AM, Stephan Budach  wrote:

Am 09.05.16 um 20:43 schrieb Dale Ghent:

On May 9, 2016, at 2:04 PM, Stephan Budach  wrote:

Am 09.05.16 um 16:33 schrieb Dale Ghent:

On May 9, 2016, at 8:24 AM, Stephan Budach  wrote:

Hi,

I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the 
LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first 
starts with a couple if link downs/ups on one port and finally the link on that 
 port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel 
on my Cisco Nexus for this connection.

I have tried swapping and interchangeing cables and thus switchports, but to no 
avail.

Anyone else noticed this and even better… knows a solution to this?

Was this an issue noticed only with r151018 and not with previous versions, or 
have you only tried this with 018?

By your description, I presume that the two ixgbe physical links will stay at 
10Gb and not bounce down to 1Gb if not LACP'd together?

/dale

I have noticed that on prior versions of OmniOS as well, but we only recently 
started deploying 10GbE LACP bonds, when we introduced our Nexus gear to our 
network. I will have to check if both links stay at 10GbE, when not being 
configured as a LACP bond. Let me check that tomorrow and report back. As we're 
heading for a streched DC, we are mainly configuring 2-way LACP bonds over our 
Nexus gear, so we don't actually have any single 10GbE connection, as they will 
all have to be conencted to both DCs. This is achieved by using VPCs on our 
Nexus switches.

Provide as much detail as you can - if you're using hw flow control, whether 
both links act this way at the same time or independently, and so-on. Problems 
like this often boil down to a very small and seemingly insignificant detail.

I currently have ixgbe on the operating table for adding X550 support, so I can 
take a look at this; however I don't have your type of switches available to me 
so LACP-specific testing is something I can't do for you.

/dale

I checked the ixgbe.conf files on each host and they all are still at the 
standard setting, which includes flow_control = 3;

As, so you are using ethernet flow control. Could you try disabling that on 
both sides (on the ixgbe host and on the switch) and see if that corrects the 
link stability issues? There's an outstanding issue with hw flow control on 
ixgbe that you *might* be running into regarding pause frame timing, which 
could manifest in the way you describe.

/dale

I will try to get one node free of all services running on it, as I will 
have to reboot the system, since I will have to change the ixgbe.conf, 
haven't I?

This is a RSF-1 host, so this will likely be done over the weekend.

Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-11 Thread Dale Ghent

> On May 11, 2016, at 7:36 AM, Stephan Budach  wrote:
> 
> Am 09.05.16 um 20:43 schrieb Dale Ghent:
>>> On May 9, 2016, at 2:04 PM, Stephan Budach  wrote:
>>> 
>>> Am 09.05.16 um 16:33 schrieb Dale Ghent:
> On May 9, 2016, at 8:24 AM, Stephan Budach  wrote:
> 
> Hi,
> 
> I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break 
> the LACP aggr-link on different boxes, when Intel X540-T2s are involved. 
> It first starts with a couple if link downs/ups on one port and finally 
> the link on that  port negiotates to 1GbE instead of 10GbE, which then 
> breaks the LACP channel on my Cisco Nexus for this connection.
> 
> I have tried swapping and interchangeing cables and thus switchports, but 
> to no avail.
> 
> Anyone else noticed this and even better… knows a solution to this?
 Was this an issue noticed only with r151018 and not with previous 
 versions, or have you only tried this with 018?
 
 By your description, I presume that the two ixgbe physical links will stay 
 at 10Gb and not bounce down to 1Gb if not LACP'd together?
 
 /dale
>>> I have noticed that on prior versions of OmniOS as well, but we only 
>>> recently started deploying 10GbE LACP bonds, when we introduced our Nexus 
>>> gear to our network. I will have to check if both links stay at 10GbE, when 
>>> not being configured as a LACP bond. Let me check that tomorrow and report 
>>> back. As we're heading for a streched DC, we are mainly configuring 2-way 
>>> LACP bonds over our Nexus gear, so we don't actually have any single 10GbE 
>>> connection, as they will all have to be conencted to both DCs. This is 
>>> achieved by using VPCs on our Nexus switches.
>> Provide as much detail as you can - if you're using hw flow control, whether 
>> both links act this way at the same time or independently, and so-on. 
>> Problems like this often boil down to a very small and seemingly 
>> insignificant detail.
>> 
>> I currently have ixgbe on the operating table for adding X550 support, so I 
>> can take a look at this; however I don't have your type of switches 
>> available to me so LACP-specific testing is something I can't do for you.
>> 
>> /dale
> I checked the ixgbe.conf files on each host and they all are still at the 
> standard setting, which includes flow_control = 3;

As, so you are using ethernet flow control. Could you try disabling that on 
both sides (on the ixgbe host and on the switch) and see if that corrects the 
link stability issues? There's an outstanding issue with hw flow control on 
ixgbe that you *might* be running into regarding pause frame timing, which 
could manifest in the way you describe.

/dale



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-11 Thread Stephan Budach

Am 11.05.16 um 14:50 schrieb Stephan Budach:

Am 11.05.16 um 13:36 schrieb Stephan Budach:

Am 09.05.16 um 20:43 schrieb Dale Ghent:
On May 9, 2016, at 2:04 PM, Stephan Budach  
wrote:


Am 09.05.16 um 16:33 schrieb Dale Ghent:
On May 9, 2016, at 8:24 AM, Stephan Budach 
 wrote:


Hi,

I have a strange behaviour where OmniOS omnios-r151018-ae3141d 
will break the LACP aggr-link on different boxes, when Intel 
X540-T2s are involved. It first starts with a couple if link 
downs/ups on one port and finally the link on that  port 
negiotates to 1GbE instead of 10GbE, which then breaks the LACP 
channel on my Cisco Nexus for this connection.


I have tried swapping and interchangeing cables and thus 
switchports, but to no avail.


Anyone else noticed this and even better… knows a solution to this?
Was this an issue noticed only with r151018 and not with previous 
versions, or have you only tried this with 018?


By your description, I presume that the two ixgbe physical links 
will stay at 10Gb and not bounce down to 1Gb if not LACP'd together?


/dale
I have noticed that on prior versions of OmniOS as well, but we 
only recently started deploying 10GbE LACP bonds, when we 
introduced our Nexus gear to our network. I will have to check if 
both links stay at 10GbE, when not being configured as a LACP bond. 
Let me check that tomorrow and report back. As we're heading for a 
streched DC, we are mainly configuring 2-way LACP bonds over our 
Nexus gear, so we don't actually have any single 10GbE connection, 
as they will all have to be conencted to both DCs. This is achieved 
by using VPCs on our Nexus switches.
Provide as much detail as you can - if you're using hw flow control, 
whether both links act this way at the same time or independently, 
and so-on. Problems like this often boil down to a very small and 
seemingly insignificant detail.


I currently have ixgbe on the operating table for adding X550 
support, so I can take a look at this; however I don't have your 
type of switches available to me so LACP-specific testing is 
something I can't do for you.


/dale
I checked the ixgbe.conf files on each host and they all are still at 
the standard setting, which includes flow_control = 3;
So they all have flow control enabled. As for the Nexus config, all 
of those ports are still on standard ethernet ports and modifications 
have only been made globally to the switch.
I will now have to yank the one port on one of the hosts from the 
aggr and configure it as a standalone port. Then we will see, if it 
still receives the disconnects/reconnects and finally the negotiation 
to 1GbE instead of 10GbE. As this only seems to happen to the same 
port I never experienced other ports of the affected aggrs acting up. 
I also thought to notice, that those were always the "same" physical 
ports, that is the first port on the card (ixgbe0), but that might of 
course be a coincidence.


Thanks,
Stephan


Ok, so we can likely rule out LACP as a generic reason for this issue… 
After removing ixgbe0 from the aggr1, I plugged it into an unused port 
of my Nexus FEX and low and behold, here we go:


root@tr1206902:/root# tail -f /var/adm/messages
May 11 14:37:17 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 
link up, 1000 Mbps, full duplex
May 11 14:38:35 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 
link down
May 11 14:38:48 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 
link up, 1 Mbps, full duplex


May 11 15:24:55 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 
link down
May 11 15:25:10 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 
link up, 1 Mbps, full duplex


So, after less than an hour, we had the first link-cycle on ixgbe0, 
alas on another port, which has no LACP config whatsoever. I will 
monitor this for a while and see, if we will get more of those.


Thanks,
Stephan 


Ehh… and sorry, I almost forgot to paste the log from the Cisco Nexus 
switch:


2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-SPEED: Interface 
Ethernet141/1/9, operational speed changed to 10 Gbps
2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-IF_DUPLEX: Interface 
Ethernet141/1/9, operational duplex mode changed to Full
2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface 
Ethernet141/1/9, operational Receive Flow Control state changed to off
2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface 
Ethernet141/1/9, operational Transmit Flow Control state changed to on
2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-IF_UP: Interface 
Ethernet141/1/9 is up in mode access
2016 May 11 14:07:29 gh79-nx-01 %ETHPORT-5-IF_DOWN_LINK_FAILURE: 
Interface Ethernet141/1/9 is down (Link failure)

2016 May 11 14:07:45 gh79-nx-01 last message repeated 1 time
2016 May 11 14:07:45 gh79-nx-01 %ETHPORT-5-SPEED: Interface 
Ethernet141/1/9, operational speed changed to 10 Gbps
2016 May 11 14:07:45 gh79-nx-01 %ETHPORT-5-IF_DUPLEX: Interface 
Ethernet141/1/9, operational duplex mode changed to Full
2016 

Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-11 Thread Stephan Budach

Am 11.05.16 um 13:36 schrieb Stephan Budach:

Am 09.05.16 um 20:43 schrieb Dale Ghent:
On May 9, 2016, at 2:04 PM, Stephan Budach  
wrote:


Am 09.05.16 um 16:33 schrieb Dale Ghent:
On May 9, 2016, at 8:24 AM, Stephan Budach  
wrote:


Hi,

I have a strange behaviour where OmniOS omnios-r151018-ae3141d 
will break the LACP aggr-link on different boxes, when Intel 
X540-T2s are involved. It first starts with a couple if link 
downs/ups on one port and finally the link on that  port 
negiotates to 1GbE instead of 10GbE, which then breaks the LACP 
channel on my Cisco Nexus for this connection.


I have tried swapping and interchangeing cables and thus 
switchports, but to no avail.


Anyone else noticed this and even better… knows a solution to this?
Was this an issue noticed only with r151018 and not with previous 
versions, or have you only tried this with 018?


By your description, I presume that the two ixgbe physical links 
will stay at 10Gb and not bounce down to 1Gb if not LACP'd together?


/dale
I have noticed that on prior versions of OmniOS as well, but we only 
recently started deploying 10GbE LACP bonds, when we introduced our 
Nexus gear to our network. I will have to check if both links stay 
at 10GbE, when not being configured as a LACP bond. Let me check 
that tomorrow and report back. As we're heading for a streched DC, 
we are mainly configuring 2-way LACP bonds over our Nexus gear, so 
we don't actually have any single 10GbE connection, as they will all 
have to be conencted to both DCs. This is achieved by using VPCs on 
our Nexus switches.
Provide as much detail as you can - if you're using hw flow control, 
whether both links act this way at the same time or independently, 
and so-on. Problems like this often boil down to a very small and 
seemingly insignificant detail.


I currently have ixgbe on the operating table for adding X550 
support, so I can take a look at this; however I don't have your type 
of switches available to me so LACP-specific testing is something I 
can't do for you.


/dale
I checked the ixgbe.conf files on each host and they all are still at 
the standard setting, which includes flow_control = 3;
So they all have flow control enabled. As for the Nexus config, all of 
those ports are still on standard ethernet ports and modifications 
have only been made globally to the switch.
I will now have to yank the one port on one of the hosts from the aggr 
and configure it as a standalone port. Then we will see, if it still 
receives the disconnects/reconnects and finally the negotiation to 
1GbE instead of 10GbE. As this only seems to happen to the same port I 
never experienced other ports of the affected aggrs acting up. I also 
thought to notice, that those were always the "same" physical ports, 
that is the first port on the card (ixgbe0), but that might of course 
be a coincidence.


Thanks,
Stephan


Ok, so we can likely rule out LACP as a generic reason for this issue… 
After removing ixgbe0 from the aggr1, I plugged it into an unused port 
of my Nexus FEX and low and behold, here we go:


root@tr1206902:/root# tail -f /var/adm/messages
May 11 14:37:17 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link 
up, 1000 Mbps, full duplex
May 11 14:38:35 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 link 
down
May 11 14:38:48 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link 
up, 1 Mbps, full duplex


May 11 15:24:55 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 link 
down
May 11 15:25:10 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link 
up, 1 Mbps, full duplex


So, after less than an hour, we had the first link-cycle on ixgbe0, alas 
on another port, which has no LACP config whatsoever. I will monitor 
this for a while and see, if we will get more of those.


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-11 Thread Stephan Budach

Am 09.05.16 um 20:43 schrieb Dale Ghent:

On May 9, 2016, at 2:04 PM, Stephan Budach  wrote:

Am 09.05.16 um 16:33 schrieb Dale Ghent:

On May 9, 2016, at 8:24 AM, Stephan Budach  wrote:

Hi,

I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the 
LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first 
starts with a couple if link downs/ups on one port and finally the link on that 
 port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel 
on my Cisco Nexus for this connection.

I have tried swapping and interchangeing cables and thus switchports, but to no 
avail.

Anyone else noticed this and even better… knows a solution to this?

Was this an issue noticed only with r151018 and not with previous versions, or 
have you only tried this with 018?

By your description, I presume that the two ixgbe physical links will stay at 
10Gb and not bounce down to 1Gb if not LACP'd together?

/dale

I have noticed that on prior versions of OmniOS as well, but we only recently 
started deploying 10GbE LACP bonds, when we introduced our Nexus gear to our 
network. I will have to check if both links stay at 10GbE, when not being 
configured as a LACP bond. Let me check that tomorrow and report back. As we're 
heading for a streched DC, we are mainly configuring 2-way LACP bonds over our 
Nexus gear, so we don't actually have any single 10GbE connection, as they will 
all have to be conencted to both DCs. This is achieved by using VPCs on our 
Nexus switches.

Provide as much detail as you can - if you're using hw flow control, whether 
both links act this way at the same time or independently, and so-on. Problems 
like this often boil down to a very small and seemingly insignificant detail.

I currently have ixgbe on the operating table for adding X550 support, so I can 
take a look at this; however I don't have your type of switches available to me 
so LACP-specific testing is something I can't do for you.

/dale
I checked the ixgbe.conf files on each host and they all are still at 
the standard setting, which includes flow_control = 3;
So they all have flow control enabled. As for the Nexus config, all of 
those ports are still on standard ethernet ports and modifications have 
only been made globally to the switch.
I will now have to yank the one port on one of the hosts from the aggr 
and configure it as a standalone port. Then we will see, if it still 
receives the disconnects/reconnects and finally the negotiation to 1GbE 
instead of 10GbE. As this only seems to happen to the same port I never 
experienced other ports of the affected aggrs acting up. I also thought 
to notice, that those were always the "same" physical ports, that is the 
first port on the card (ixgbe0), but that might of course be a coincidence.


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-09 Thread Dale Ghent

> On May 9, 2016, at 2:04 PM, Stephan Budach  wrote:
> 
> Am 09.05.16 um 16:33 schrieb Dale Ghent:
>>> On May 9, 2016, at 8:24 AM, Stephan Budach  wrote:
>>> 
>>> Hi,
>>> 
>>> I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break 
>>> the LACP aggr-link on different boxes, when Intel X540-T2s are involved. It 
>>> first starts with a couple if link downs/ups on one port and finally the 
>>> link on that  port negiotates to 1GbE instead of 10GbE, which then breaks 
>>> the LACP channel on my Cisco Nexus for this connection.
>>> 
>>> I have tried swapping and interchangeing cables and thus switchports, but 
>>> to no avail.
>>> 
>>> Anyone else noticed this and even better… knows a solution to this?
>> Was this an issue noticed only with r151018 and not with previous versions, 
>> or have you only tried this with 018?
>> 
>> By your description, I presume that the two ixgbe physical links will stay 
>> at 10Gb and not bounce down to 1Gb if not LACP'd together?
>> 
>> /dale
> I have noticed that on prior versions of OmniOS as well, but we only recently 
> started deploying 10GbE LACP bonds, when we introduced our Nexus gear to our 
> network. I will have to check if both links stay at 10GbE, when not being 
> configured as a LACP bond. Let me check that tomorrow and report back. As 
> we're heading for a streched DC, we are mainly configuring 2-way LACP bonds 
> over our Nexus gear, so we don't actually have any single 10GbE connection, 
> as they will all have to be conencted to both DCs. This is achieved by using 
> VPCs on our Nexus switches.

Provide as much detail as you can - if you're using hw flow control, whether 
both links act this way at the same time or independently, and so-on. Problems 
like this often boil down to a very small and seemingly insignificant detail.

I currently have ixgbe on the operating table for adding X550 support, so I can 
take a look at this; however I don't have your type of switches available to me 
so LACP-specific testing is something I can't do for you.

/dale


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-09 Thread Stephan Budach

Am 09.05.16 um 16:33 schrieb Dale Ghent:

On May 9, 2016, at 8:24 AM, Stephan Budach  wrote:

Hi,

I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the 
LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first 
starts with a couple if link downs/ups on one port and finally the link on that 
 port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel 
on my Cisco Nexus for this connection.

I have tried swapping and interchangeing cables and thus switchports, but to no 
avail.

Anyone else noticed this and even better… knows a solution to this?

Was this an issue noticed only with r151018 and not with previous versions, or 
have you only tried this with 018?

By your description, I presume that the two ixgbe physical links will stay at 
10Gb and not bounce down to 1Gb if not LACP'd together?

/dale
I have noticed that on prior versions of OmniOS as well, but we only 
recently started deploying 10GbE LACP bonds, when we introduced our 
Nexus gear to our network. I will have to check if both links stay at 
10GbE, when not being configured as a LACP bond. Let me check that 
tomorrow and report back. As we're heading for a streched DC, we are 
mainly configuring 2-way LACP bonds over our Nexus gear, so we don't 
actually have any single 10GbE connection, as they will all have to be 
conencted to both DCs. This is achieved by using VPCs on our Nexus switches.


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-09 Thread Dale Ghent

> On May 9, 2016, at 8:24 AM, Stephan Budach  wrote:
> 
> Hi,
> 
> I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the 
> LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first 
> starts with a couple if link downs/ups on one port and finally the link on 
> that  port negiotates to 1GbE instead of 10GbE, which then breaks the LACP 
> channel on my Cisco Nexus for this connection.
> 
> I have tried swapping and interchangeing cables and thus switchports, but to 
> no avail.
> 
> Anyone else noticed this and even better… knows a solution to this?

Was this an issue noticed only with r151018 and not with previous versions, or 
have you only tried this with 018?

By your description, I presume that the two ixgbe physical links will stay at 
10Gb and not bounce down to 1Gb if not LACP'd together?

/dale


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-09 Thread Stephan Budach

Hi,

I have a strange behaviour where OmniOS omnios-r151018-ae3141d will 
break the LACP aggr-link on different boxes, when Intel X540-T2s are 
involved. It first starts with a couple if link downs/ups on one port 
and finally the link on that  port negiotates to 1GbE instead of 10GbE, 
which then breaks the LACP channel on my Cisco Nexus for this connection.


I have tried swapping and interchangeing cables and thus switchports, 
but to no avail.


Anyone else noticed this and even better… knows a solution to this?

Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss