Re: LAGG bug or misconfiguration???
Hi guys ... just for the record. I've fixed the issue simply moving the cable of the backup interface to another switch as suggested by the network guys of the DC. Which is even preferable under the network redundancy perspective. Now works perfectly and the failover NIC0-NIC1 and (NIC1-NIC0) is immediate. Many thanks for your time. Cheers. On Fri, 2012-03-16 at 17:49 +0100, Damien Fleuriot wrote: > I confirm you should see fast transition for your VLANs to forwarding state. > > > Are your ports in access or trunk mode ? > > If they're trunked, portfast alone won't do it, you need "spanning-tree > portfast trunk". > > Additionally, are you using link aggregation on the cisco swi ? > (channel-group) > > > On 3/16/12 5:31 PM, Snoop wrote: > > That's the STP configuration on my two switch ports: > > > > spanning-tree portfast > > spanning-tree bpduguard enable > > > > > > > > On Fri, 2012-03-16 at 12:10 +0100, Damien Fleuriot wrote: > >> You're not looking for FEC or ethechannel or 802.3ad at all. > >> > >> What you're looking for, in the case of a *failover* configuration, is a > >> "spanning-tree portfast" feature so that your port doesn't transition > >> through the different spantree states before forwarding traffic. > >> > >> Kindly obtain the configuration from whoever has it and let us know. > >> > >> > >> On 3/16/12 11:18 AM, Snoop wrote: > >>> Hi Dweimer and Damien, > >>> thanks for replying. > >>> > >>> The server is connected to a switch of the datacentre. The configuration > >>> of this switch is unknown to me and I obviously have no access to it but > >>> I truly believe that such an enterprise environment has management > >>> capabilities. > >>> Anyway, in which way the configuration would affect the lagg > >>> functionality? Might this issue be related to what stated in the FreeBSD > >>> LAGG pages in the handbook? > >>> http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html > >>> > >>> "Cisco® Fast EtherChannel® > >>> > >>> Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate > >>> aggregation with the peer or exchange frames to monitor the link. If the > >>> switch supports LACP then that should be used instead." > >>> > >>> > >>> > >>> On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: > Sorry top posting from phone. > > > Show your switch's port configurations. > > We're using VLAN tagging over lagg failover interfaces at work and I > have already tried the tests you described, to much better results. > > We're also running 8.2 so the only thing that seems to differ between us > is the switch config, likely. > > > > On 15 Mar 2012, at 20:06, Snoop wrote: > > > Hi there, > > a while after setting up my new server (with 8 jails in it) I've decided > > (after postponing several times) to properly check the functionality of > > the lagg and the result was very disappointing. > > > > The test I've done is very simple. > > I've started copying a file from one site to another of my VPN network > > (from the server I've been testing the net to another node somewhere > > else) and in the meantime I've been physically disconnecting the main > > network cable to check the responsiveness of the lagg configuration. > > Then I've plugged the cable back to check if the traffic would switch > > back to the main NIC as it should. > > > > The result was basically this (lagg0 members: bge0 primary, bge1 > > secondary) > > > > - when bge0 unplugged the traffic switched almost instantaneously to > > bge1 > > - when bge0 plugged back in, the network stopped working completely with > > the two NICs polling synchronously until I manually unplug bge1. Then > > within 2-4 seconds traffic goes back on bge0 (I've been waiting for a > > little more than a minute maximum to avoid all the active connections on > > the server to timeout). > > > > Now, I've repeated the same test about 10-15 times randomly waiting for > > different times between the unplug-replug procedure. The result was > > always the same. > > > > So, below are the ipconfig outputs > > - before to start the test > > - when bge0 gets unplugged > > - when bge0 gets plugged back in > > > > I couldn't see anything odd. > > ___ > > lagg0: flags=8843 metric 0 mtu > > 1500 > > > > options=8009b > >ether 00:14:ee:00:8a:c0 > >inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 > >inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 > >inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 > >inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 > >inet 172.16.3.3 netmask 0x broadc
Re: LAGG bug or misconfiguration???
I actually don't know Damien. I'll have to have a chat with the network guy in the DC as I'm not managing the switch neither I have access to it, plus I'm not really a Cisco guy so I'll forward those questions to him. Moreover I'm getting a bit lost with this. If the ports are in trunk mode would this affect the FreeBSD lagg functionality? If yes how? Do I need "spanning-tree portfast trunk" to make it work properly? I really appreciate your useful inputs Damien. On Fri, 2012-03-16 at 17:49 +0100, Damien Fleuriot wrote: > I confirm you should see fast transition for your VLANs to forwarding state. > > > Are your ports in access or trunk mode ? > > If they're trunked, portfast alone won't do it, you need "spanning-tree > portfast trunk". > > Additionally, are you using link aggregation on the cisco swi ? > (channel-group) > > > On 3/16/12 5:31 PM, Snoop wrote: > > That's the STP configuration on my two switch ports: > > > > spanning-tree portfast > > spanning-tree bpduguard enable > > > > > > > > On Fri, 2012-03-16 at 12:10 +0100, Damien Fleuriot wrote: > >> You're not looking for FEC or ethechannel or 802.3ad at all. > >> > >> What you're looking for, in the case of a *failover* configuration, is a > >> "spanning-tree portfast" feature so that your port doesn't transition > >> through the different spantree states before forwarding traffic. > >> > >> Kindly obtain the configuration from whoever has it and let us know. > >> > >> > >> On 3/16/12 11:18 AM, Snoop wrote: > >>> Hi Dweimer and Damien, > >>> thanks for replying. > >>> > >>> The server is connected to a switch of the datacentre. The configuration > >>> of this switch is unknown to me and I obviously have no access to it but > >>> I truly believe that such an enterprise environment has management > >>> capabilities. > >>> Anyway, in which way the configuration would affect the lagg > >>> functionality? Might this issue be related to what stated in the FreeBSD > >>> LAGG pages in the handbook? > >>> http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html > >>> > >>> "Cisco® Fast EtherChannel® > >>> > >>> Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate > >>> aggregation with the peer or exchange frames to monitor the link. If the > >>> switch supports LACP then that should be used instead." > >>> > >>> > >>> > >>> On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: > Sorry top posting from phone. > > > Show your switch's port configurations. > > We're using VLAN tagging over lagg failover interfaces at work and I > have already tried the tests you described, to much better results. > > We're also running 8.2 so the only thing that seems to differ between us > is the switch config, likely. > > > > On 15 Mar 2012, at 20:06, Snoop wrote: > > > Hi there, > > a while after setting up my new server (with 8 jails in it) I've decided > > (after postponing several times) to properly check the functionality of > > the lagg and the result was very disappointing. > > > > The test I've done is very simple. > > I've started copying a file from one site to another of my VPN network > > (from the server I've been testing the net to another node somewhere > > else) and in the meantime I've been physically disconnecting the main > > network cable to check the responsiveness of the lagg configuration. > > Then I've plugged the cable back to check if the traffic would switch > > back to the main NIC as it should. > > > > The result was basically this (lagg0 members: bge0 primary, bge1 > > secondary) > > > > - when bge0 unplugged the traffic switched almost instantaneously to > > bge1 > > - when bge0 plugged back in, the network stopped working completely with > > the two NICs polling synchronously until I manually unplug bge1. Then > > within 2-4 seconds traffic goes back on bge0 (I've been waiting for a > > little more than a minute maximum to avoid all the active connections on > > the server to timeout). > > > > Now, I've repeated the same test about 10-15 times randomly waiting for > > different times between the unplug-replug procedure. The result was > > always the same. > > > > So, below are the ipconfig outputs > > - before to start the test > > - when bge0 gets unplugged > > - when bge0 gets plugged back in > > > > I couldn't see anything odd. > > ___ > > lagg0: flags=8843 metric 0 mtu > > 1500 > > > > options=8009b > >ether 00:14:ee:00:8a:c0 > >inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 > >inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 > >inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225
Re: LAGG bug or misconfiguration???
I confirm you should see fast transition for your VLANs to forwarding state. Are your ports in access or trunk mode ? If they're trunked, portfast alone won't do it, you need "spanning-tree portfast trunk". Additionally, are you using link aggregation on the cisco swi ? (channel-group) On 3/16/12 5:31 PM, Snoop wrote: > That's the STP configuration on my two switch ports: > > spanning-tree portfast > spanning-tree bpduguard enable > > > > On Fri, 2012-03-16 at 12:10 +0100, Damien Fleuriot wrote: >> You're not looking for FEC or ethechannel or 802.3ad at all. >> >> What you're looking for, in the case of a *failover* configuration, is a >> "spanning-tree portfast" feature so that your port doesn't transition >> through the different spantree states before forwarding traffic. >> >> Kindly obtain the configuration from whoever has it and let us know. >> >> >> On 3/16/12 11:18 AM, Snoop wrote: >>> Hi Dweimer and Damien, >>> thanks for replying. >>> >>> The server is connected to a switch of the datacentre. The configuration >>> of this switch is unknown to me and I obviously have no access to it but >>> I truly believe that such an enterprise environment has management >>> capabilities. >>> Anyway, in which way the configuration would affect the lagg >>> functionality? Might this issue be related to what stated in the FreeBSD >>> LAGG pages in the handbook? >>> http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html >>> >>> "Cisco® Fast EtherChannel® >>> >>> Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate >>> aggregation with the peer or exchange frames to monitor the link. If the >>> switch supports LACP then that should be used instead." >>> >>> >>> >>> On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: Sorry top posting from phone. Show your switch's port configurations. We're using VLAN tagging over lagg failover interfaces at work and I have already tried the tests you described, to much better results. We're also running 8.2 so the only thing that seems to differ between us is the switch config, likely. On 15 Mar 2012, at 20:06, Snoop wrote: > Hi there, > a while after setting up my new server (with 8 jails in it) I've decided > (after postponing several times) to properly check the functionality of > the lagg and the result was very disappointing. > > The test I've done is very simple. > I've started copying a file from one site to another of my VPN network > (from the server I've been testing the net to another node somewhere > else) and in the meantime I've been physically disconnecting the main > network cable to check the responsiveness of the lagg configuration. > Then I've plugged the cable back to check if the traffic would switch > back to the main NIC as it should. > > The result was basically this (lagg0 members: bge0 primary, bge1 > secondary) > > - when bge0 unplugged the traffic switched almost instantaneously to > bge1 > - when bge0 plugged back in, the network stopped working completely with > the two NICs polling synchronously until I manually unplug bge1. Then > within 2-4 seconds traffic goes back on bge0 (I've been waiting for a > little more than a minute maximum to avoid all the active connections on > the server to timeout). > > Now, I've repeated the same test about 10-15 times randomly waiting for > different times between the unplug-replug procedure. The result was > always the same. > > So, below are the ipconfig outputs > - before to start the test > - when bge0 gets unplugged > - when bge0 gets plugged back in > > I couldn't see anything odd. > ___ > lagg0: flags=8843 metric 0 mtu > 1500 > > options=8009b >ether 00:14:ee:00:8a:c0 >inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 >inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 >inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 >inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 >inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 >inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 >inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 >inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 >inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 >media: Ethernet autoselect >status: active >laggproto failover >laggport: bge1 flags=0<> >laggport: bge0 flags=5 > ___ > lagg0: flags=8843 metric 0 mtu > 1500 > > options
Re: LAGG bug or misconfiguration???
That's the STP configuration on my two switch ports: spanning-tree portfast spanning-tree bpduguard enable On Fri, 2012-03-16 at 12:10 +0100, Damien Fleuriot wrote: > You're not looking for FEC or ethechannel or 802.3ad at all. > > What you're looking for, in the case of a *failover* configuration, is a > "spanning-tree portfast" feature so that your port doesn't transition > through the different spantree states before forwarding traffic. > > Kindly obtain the configuration from whoever has it and let us know. > > > On 3/16/12 11:18 AM, Snoop wrote: > > Hi Dweimer and Damien, > > thanks for replying. > > > > The server is connected to a switch of the datacentre. The configuration > > of this switch is unknown to me and I obviously have no access to it but > > I truly believe that such an enterprise environment has management > > capabilities. > > Anyway, in which way the configuration would affect the lagg > > functionality? Might this issue be related to what stated in the FreeBSD > > LAGG pages in the handbook? > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html > > > > "Cisco® Fast EtherChannel® > > > > Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate > > aggregation with the peer or exchange frames to monitor the link. If the > > switch supports LACP then that should be used instead." > > > > > > > > On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: > >> Sorry top posting from phone. > >> > >> > >> Show your switch's port configurations. > >> > >> We're using VLAN tagging over lagg failover interfaces at work and I have > >> already tried the tests you described, to much better results. > >> > >> We're also running 8.2 so the only thing that seems to differ between us > >> is the switch config, likely. > >> > >> > >> > >> On 15 Mar 2012, at 20:06, Snoop wrote: > >> > >>> Hi there, > >>> a while after setting up my new server (with 8 jails in it) I've decided > >>> (after postponing several times) to properly check the functionality of > >>> the lagg and the result was very disappointing. > >>> > >>> The test I've done is very simple. > >>> I've started copying a file from one site to another of my VPN network > >>> (from the server I've been testing the net to another node somewhere > >>> else) and in the meantime I've been physically disconnecting the main > >>> network cable to check the responsiveness of the lagg configuration. > >>> Then I've plugged the cable back to check if the traffic would switch > >>> back to the main NIC as it should. > >>> > >>> The result was basically this (lagg0 members: bge0 primary, bge1 > >>> secondary) > >>> > >>> - when bge0 unplugged the traffic switched almost instantaneously to > >>> bge1 > >>> - when bge0 plugged back in, the network stopped working completely with > >>> the two NICs polling synchronously until I manually unplug bge1. Then > >>> within 2-4 seconds traffic goes back on bge0 (I've been waiting for a > >>> little more than a minute maximum to avoid all the active connections on > >>> the server to timeout). > >>> > >>> Now, I've repeated the same test about 10-15 times randomly waiting for > >>> different times between the unplug-replug procedure. The result was > >>> always the same. > >>> > >>> So, below are the ipconfig outputs > >>> - before to start the test > >>> - when bge0 gets unplugged > >>> - when bge0 gets plugged back in > >>> > >>> I couldn't see anything odd. > >>> ___ > >>> lagg0: flags=8843 metric 0 mtu > >>> 1500 > >>> > >>> options=8009b > >>>ether 00:14:ee:00:8a:c0 > >>>inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 > >>>inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 > >>>inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 > >>>inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 > >>>inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 > >>>inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 > >>>inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 > >>>inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 > >>>inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 > >>>media: Ethernet autoselect > >>>status: active > >>>laggproto failover > >>>laggport: bge1 flags=0<> > >>>laggport: bge0 flags=5 > >>> ___ > >>> lagg0: flags=8843 metric 0 mtu > >>> 1500 > >>> > >>> options=8009b > >>>ether 00:14:ee:00:8a:c0 > >>>inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 > >>>inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 > >>>inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 > >>>inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 > >>
Re: LAGG bug or misconfiguration???
I've requested the configuration. I'll post that as soon as I have it. Thank you very much for your time. On Fri, 2012-03-16 at 12:10 +0100, Damien Fleuriot wrote: > You're not looking for FEC or ethechannel or 802.3ad at all. > > What you're looking for, in the case of a *failover* configuration, is a > "spanning-tree portfast" feature so that your port doesn't transition > through the different spantree states before forwarding traffic. > > Kindly obtain the configuration from whoever has it and let us know. > > > On 3/16/12 11:18 AM, Snoop wrote: > > Hi Dweimer and Damien, > > thanks for replying. > > > > The server is connected to a switch of the datacentre. The configuration > > of this switch is unknown to me and I obviously have no access to it but > > I truly believe that such an enterprise environment has management > > capabilities. > > Anyway, in which way the configuration would affect the lagg > > functionality? Might this issue be related to what stated in the FreeBSD > > LAGG pages in the handbook? > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html > > > > "Cisco® Fast EtherChannel® > > > > Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate > > aggregation with the peer or exchange frames to monitor the link. If the > > switch supports LACP then that should be used instead." > > > > > > > > On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: > >> Sorry top posting from phone. > >> > >> > >> Show your switch's port configurations. > >> > >> We're using VLAN tagging over lagg failover interfaces at work and I have > >> already tried the tests you described, to much better results. > >> > >> We're also running 8.2 so the only thing that seems to differ between us > >> is the switch config, likely. > >> > >> > >> > >> On 15 Mar 2012, at 20:06, Snoop wrote: > >> > >>> Hi there, > >>> a while after setting up my new server (with 8 jails in it) I've decided > >>> (after postponing several times) to properly check the functionality of > >>> the lagg and the result was very disappointing. > >>> > >>> The test I've done is very simple. > >>> I've started copying a file from one site to another of my VPN network > >>> (from the server I've been testing the net to another node somewhere > >>> else) and in the meantime I've been physically disconnecting the main > >>> network cable to check the responsiveness of the lagg configuration. > >>> Then I've plugged the cable back to check if the traffic would switch > >>> back to the main NIC as it should. > >>> > >>> The result was basically this (lagg0 members: bge0 primary, bge1 > >>> secondary) > >>> > >>> - when bge0 unplugged the traffic switched almost instantaneously to > >>> bge1 > >>> - when bge0 plugged back in, the network stopped working completely with > >>> the two NICs polling synchronously until I manually unplug bge1. Then > >>> within 2-4 seconds traffic goes back on bge0 (I've been waiting for a > >>> little more than a minute maximum to avoid all the active connections on > >>> the server to timeout). > >>> > >>> Now, I've repeated the same test about 10-15 times randomly waiting for > >>> different times between the unplug-replug procedure. The result was > >>> always the same. > >>> > >>> So, below are the ipconfig outputs > >>> - before to start the test > >>> - when bge0 gets unplugged > >>> - when bge0 gets plugged back in > >>> > >>> I couldn't see anything odd. > >>> ___ > >>> lagg0: flags=8843 metric 0 mtu > >>> 1500 > >>> > >>> options=8009b > >>>ether 00:14:ee:00:8a:c0 > >>>inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 > >>>inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 > >>>inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 > >>>inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 > >>>inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 > >>>inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 > >>>inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 > >>>inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 > >>>inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 > >>>media: Ethernet autoselect > >>>status: active > >>>laggproto failover > >>>laggport: bge1 flags=0<> > >>>laggport: bge0 flags=5 > >>> ___ > >>> lagg0: flags=8843 metric 0 mtu > >>> 1500 > >>> > >>> options=8009b > >>>ether 00:14:ee:00:8a:c0 > >>>inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 > >>>inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 > >>>inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 > >>>inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 > >>>
Re: LAGG bug or misconfiguration???
You're not looking for FEC or ethechannel or 802.3ad at all. What you're looking for, in the case of a *failover* configuration, is a "spanning-tree portfast" feature so that your port doesn't transition through the different spantree states before forwarding traffic. Kindly obtain the configuration from whoever has it and let us know. On 3/16/12 11:18 AM, Snoop wrote: > Hi Dweimer and Damien, > thanks for replying. > > The server is connected to a switch of the datacentre. The configuration > of this switch is unknown to me and I obviously have no access to it but > I truly believe that such an enterprise environment has management > capabilities. > Anyway, in which way the configuration would affect the lagg > functionality? Might this issue be related to what stated in the FreeBSD > LAGG pages in the handbook? > http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html > > "Cisco® Fast EtherChannel® > > Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate > aggregation with the peer or exchange frames to monitor the link. If the > switch supports LACP then that should be used instead." > > > > On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: >> Sorry top posting from phone. >> >> >> Show your switch's port configurations. >> >> We're using VLAN tagging over lagg failover interfaces at work and I have >> already tried the tests you described, to much better results. >> >> We're also running 8.2 so the only thing that seems to differ between us is >> the switch config, likely. >> >> >> >> On 15 Mar 2012, at 20:06, Snoop wrote: >> >>> Hi there, >>> a while after setting up my new server (with 8 jails in it) I've decided >>> (after postponing several times) to properly check the functionality of >>> the lagg and the result was very disappointing. >>> >>> The test I've done is very simple. >>> I've started copying a file from one site to another of my VPN network >>> (from the server I've been testing the net to another node somewhere >>> else) and in the meantime I've been physically disconnecting the main >>> network cable to check the responsiveness of the lagg configuration. >>> Then I've plugged the cable back to check if the traffic would switch >>> back to the main NIC as it should. >>> >>> The result was basically this (lagg0 members: bge0 primary, bge1 >>> secondary) >>> >>> - when bge0 unplugged the traffic switched almost instantaneously to >>> bge1 >>> - when bge0 plugged back in, the network stopped working completely with >>> the two NICs polling synchronously until I manually unplug bge1. Then >>> within 2-4 seconds traffic goes back on bge0 (I've been waiting for a >>> little more than a minute maximum to avoid all the active connections on >>> the server to timeout). >>> >>> Now, I've repeated the same test about 10-15 times randomly waiting for >>> different times between the unplug-replug procedure. The result was >>> always the same. >>> >>> So, below are the ipconfig outputs >>> - before to start the test >>> - when bge0 gets unplugged >>> - when bge0 gets plugged back in >>> >>> I couldn't see anything odd. >>> ___ >>> lagg0: flags=8843 metric 0 mtu >>> 1500 >>> >>> options=8009b >>>ether 00:14:ee:00:8a:c0 >>>inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 >>>inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 >>>inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 >>>inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 >>>inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 >>>inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 >>>inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 >>>inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 >>>inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 >>>media: Ethernet autoselect >>>status: active >>>laggproto failover >>>laggport: bge1 flags=0<> >>>laggport: bge0 flags=5 >>> ___ >>> lagg0: flags=8843 metric 0 mtu >>> 1500 >>> >>> options=8009b >>>ether 00:14:ee:00:8a:c0 >>>inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 >>>inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 >>>inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 >>>inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 >>>inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 >>>inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 >>>inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 >>>inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 >>>inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 >>>media: Ethernet autoselect >>>statu
Re: LAGG bug or misconfiguration???
Hi Dweimer and Damien, thanks for replying. The server is connected to a switch of the datacentre. The configuration of this switch is unknown to me and I obviously have no access to it but I truly believe that such an enterprise environment has management capabilities. Anyway, in which way the configuration would affect the lagg functionality? Might this issue be related to what stated in the FreeBSD LAGG pages in the handbook? http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html "Cisco® Fast EtherChannel® Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate aggregation with the peer or exchange frames to monitor the link. If the switch supports LACP then that should be used instead." On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: > Sorry top posting from phone. > > > Show your switch's port configurations. > > We're using VLAN tagging over lagg failover interfaces at work and I have > already tried the tests you described, to much better results. > > We're also running 8.2 so the only thing that seems to differ between us is > the switch config, likely. > > > > On 15 Mar 2012, at 20:06, Snoop wrote: > > > Hi there, > > a while after setting up my new server (with 8 jails in it) I've decided > > (after postponing several times) to properly check the functionality of > > the lagg and the result was very disappointing. > > > > The test I've done is very simple. > > I've started copying a file from one site to another of my VPN network > > (from the server I've been testing the net to another node somewhere > > else) and in the meantime I've been physically disconnecting the main > > network cable to check the responsiveness of the lagg configuration. > > Then I've plugged the cable back to check if the traffic would switch > > back to the main NIC as it should. > > > > The result was basically this (lagg0 members: bge0 primary, bge1 > > secondary) > > > > - when bge0 unplugged the traffic switched almost instantaneously to > > bge1 > > - when bge0 plugged back in, the network stopped working completely with > > the two NICs polling synchronously until I manually unplug bge1. Then > > within 2-4 seconds traffic goes back on bge0 (I've been waiting for a > > little more than a minute maximum to avoid all the active connections on > > the server to timeout). > > > > Now, I've repeated the same test about 10-15 times randomly waiting for > > different times between the unplug-replug procedure. The result was > > always the same. > > > > So, below are the ipconfig outputs > > - before to start the test > > - when bge0 gets unplugged > > - when bge0 gets plugged back in > > > > I couldn't see anything odd. > > ___ > > lagg0: flags=8843 metric 0 mtu > > 1500 > > > > options=8009b > >ether 00:14:ee:00:8a:c0 > >inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 > >inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 > >inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 > >inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 > >inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 > >inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 > >inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 > >inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 > >inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 > >media: Ethernet autoselect > >status: active > >laggproto failover > >laggport: bge1 flags=0<> > >laggport: bge0 flags=5 > > ___ > > lagg0: flags=8843 metric 0 mtu > > 1500 > > > > options=8009b > >ether 00:14:ee:00:8a:c0 > >inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 > >inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 > >inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 > >inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 > >inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 > >inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 > >inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 > >inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 > >inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 > >media: Ethernet autoselect > >status: active > >laggproto failover > >laggport: bge1 flags=4 > >laggport: bge0 flags=1 > > ___ > > > > lagg0: flags=8843 metric 0 mtu > > 1500 > > > > options=8009b > >ether 00:14:ee:00:8a:c0 > >inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 > >inet xxx.xx.xx.227 netmask 0xf
Re: LAGG bug or misconfiguration???
Sorry top posting from phone. Show your switch's port configurations. We're using VLAN tagging over lagg failover interfaces at work and I have already tried the tests you described, to much better results. We're also running 8.2 so the only thing that seems to differ between us is the switch config, likely. On 15 Mar 2012, at 20:06, Snoop wrote: > Hi there, > a while after setting up my new server (with 8 jails in it) I've decided > (after postponing several times) to properly check the functionality of > the lagg and the result was very disappointing. > > The test I've done is very simple. > I've started copying a file from one site to another of my VPN network > (from the server I've been testing the net to another node somewhere > else) and in the meantime I've been physically disconnecting the main > network cable to check the responsiveness of the lagg configuration. > Then I've plugged the cable back to check if the traffic would switch > back to the main NIC as it should. > > The result was basically this (lagg0 members: bge0 primary, bge1 > secondary) > > - when bge0 unplugged the traffic switched almost instantaneously to > bge1 > - when bge0 plugged back in, the network stopped working completely with > the two NICs polling synchronously until I manually unplug bge1. Then > within 2-4 seconds traffic goes back on bge0 (I've been waiting for a > little more than a minute maximum to avoid all the active connections on > the server to timeout). > > Now, I've repeated the same test about 10-15 times randomly waiting for > different times between the unplug-replug procedure. The result was > always the same. > > So, below are the ipconfig outputs > - before to start the test > - when bge0 gets unplugged > - when bge0 gets plugged back in > > I couldn't see anything odd. > ___ > lagg0: flags=8843 metric 0 mtu > 1500 > > options=8009b >ether 00:14:ee:00:8a:c0 >inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 >inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 >inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 >inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 >inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 >inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 >inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 >inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 >inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 >media: Ethernet autoselect >status: active >laggproto failover >laggport: bge1 flags=0<> >laggport: bge0 flags=5 > ___ > lagg0: flags=8843 metric 0 mtu > 1500 > > options=8009b >ether 00:14:ee:00:8a:c0 >inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 >inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 >inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 >inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 >inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 >inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 >inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 >inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 >inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 >media: Ethernet autoselect >status: active >laggproto failover >laggport: bge1 flags=4 >laggport: bge0 flags=1 > ___ > > lagg0: flags=8843 metric 0 mtu > 1500 > > options=8009b >ether 00:14:ee:00:8a:c0 >inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 >inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 >inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 >inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 >inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 >inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 >inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 >inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 >inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 >media: Ethernet autoselect >status: active >laggproto failover >laggport: bge1 flags=0<> >laggport: bge0 flags=5 > __ > Also nothing unusual on dmesg: > > ... > bge0: link state changed to DOWN > bge0: link state changed to UP > bge1: link state changed to DOWN > bge1: link state changed to UP > bge0: link state changed to DOWN > bge0: link state changed to UP > bge1: link sta
Re: LAGG bug or misconfiguration???
On 15.03.2012 14:06, Snoop wrote: Hi there, a while after setting up my new server (with 8 jails in it) I've decided (after postponing several times) to properly check the functionality of the lagg and the result was very disappointing. The test I've done is very simple. I've started copying a file from one site to another of my VPN network (from the server I've been testing the net to another node somewhere else) and in the meantime I've been physically disconnecting the main network cable to check the responsiveness of the lagg configuration. Then I've plugged the cable back to check if the traffic would switch back to the main NIC as it should. The result was basically this (lagg0 members: bge0 primary, bge1 secondary) - when bge0 unplugged the traffic switched almost instantaneously to bge1 - when bge0 plugged back in, the network stopped working completely with the two NICs polling synchronously until I manually unplug bge1. Then within 2-4 seconds traffic goes back on bge0 (I've been waiting for a little more than a minute maximum to avoid all the active connections on the server to timeout). Now, I've repeated the same test about 10-15 times randomly waiting for different times between the unplug-replug procedure. The result was always the same. So, below are the ipconfig outputs - before to start the test - when bge0 gets unplugged - when bge0 gets plugged back in I couldn't see anything odd. ___ lagg0: flags=8843 metric 0 mtu 1500 options=8009b ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0<> laggport: bge0 flags=5 ___ lagg0: flags=8843 metric 0 mtu 1500 options=8009b ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=4 laggport: bge0 flags=1 ___ lagg0: flags=8843 metric 0 mtu 1500 options=8009b ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0<> laggport: bge0 flags=5 __ Also nothing unusual on dmesg: ... bge0: link state changed to DOWN bge0: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP bge0: link state changed to DOWN bge0: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP bge0: link state changed to DOWN bge0: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP ... The following is the related configuration in rc.conf: ... ifconfig_bge0="up" ifconfig_bge1="up" cloned_interfaces="lagg0" ifconfig_lagg0="laggproto failover laggport bge0 laggport bge1 xxx.xx.xx.224/24" ifconfig_lagg0_alias_0="inet xxx.xx.xx.225/32" ifconfig_lagg0_alias_1="
LAGG bug or misconfiguration???
Hi there, a while after setting up my new server (with 8 jails in it) I've decided (after postponing several times) to properly check the functionality of the lagg and the result was very disappointing. The test I've done is very simple. I've started copying a file from one site to another of my VPN network (from the server I've been testing the net to another node somewhere else) and in the meantime I've been physically disconnecting the main network cable to check the responsiveness of the lagg configuration. Then I've plugged the cable back to check if the traffic would switch back to the main NIC as it should. The result was basically this (lagg0 members: bge0 primary, bge1 secondary) - when bge0 unplugged the traffic switched almost instantaneously to bge1 - when bge0 plugged back in, the network stopped working completely with the two NICs polling synchronously until I manually unplug bge1. Then within 2-4 seconds traffic goes back on bge0 (I've been waiting for a little more than a minute maximum to avoid all the active connections on the server to timeout). Now, I've repeated the same test about 10-15 times randomly waiting for different times between the unplug-replug procedure. The result was always the same. So, below are the ipconfig outputs - before to start the test - when bge0 gets unplugged - when bge0 gets plugged back in I couldn't see anything odd. ___ lagg0: flags=8843 metric 0 mtu 1500 options=8009b ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0<> laggport: bge0 flags=5 ___ lagg0: flags=8843 metric 0 mtu 1500 options=8009b ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=4 laggport: bge0 flags=1 ___ lagg0: flags=8843 metric 0 mtu 1500 options=8009b ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0<> laggport: bge0 flags=5 __ Also nothing unusual on dmesg: ... bge0: link state changed to DOWN bge0: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP bge0: link state changed to DOWN bge0: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP bge0: link state changed to DOWN bge0: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP ... The following is the related configuration in rc.conf: ... ifconfig_bge0="up" ifconfig_bge1="up" cloned_interfaces="lagg0" ifconfig_lagg0="laggproto failover laggport bge0 laggport bge1 xxx.xx.xx.224/24" ifconfig_lagg0_alias_0="inet xxx.xx.xx.225/32" ifconfig_lagg0_alias_1="inet xxx.xx.xx.226/32" ifconfig_lagg0_alias_2="inet xxx.xx.x