Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
hi I'd like to test with LACP slow, then can see if physical interface still flaps... Thanks for your support Il giorno dom 11 feb 2024 alle ore 18:02 Saku Ytti ha scritto: > On Sun, 11 Feb 2024 at 17:52, james list wrote: > > > - why physical interface flaps in DC1 if it is related to lacp ? > > 16:39:35.813 Juniper reports LACP timeout (so problem started at > 16:39:32, (was traffic passing at 32, 33, 34 seconds?)) > 16:39:36.xxx Cisco reports interface down, long after problem has > already started > > Why Cisco reports physical interface down, I'm not sure. But clearly > the problem was already happening before interface down, and first log > entry is LACP timeout, which occurs 3s after the problem starts. > Perhaps Juniper asserts for some reason RFI? Perhaps Cisco resets the > physical interface once removed from LACP? > > > - why the same setup in DC2 do not report issues ? > > If this is is LACP related software issue, could be difference not > identified. You need to gather more information, like how does ping > look throughout this event, particularly before syslog entries. And if > ping still works up-until syslog, you almost certainly have software > issue with LACP inject at Cisco, or more likely LACP punt at Juniper. > > -- > ++ytti > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
Hi I have a couple of points to ask related to your idea: - why physical interface flaps in DC1 if it is related to lacp ? - why the same setup in DC2 do not report issues ? NEXUS01# sh logging | in Initia | last 15 2024 Jan 17 22:37:49 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Jan 18 23:54:25 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Jan 19 00:58:13 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Jan 19 07:15:04 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Jan 22 16:03:13 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Jan 25 21:32:29 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Jan 26 18:41:12 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Jan 28 05:07:20 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Jan 29 04:06:52 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Jan 30 03:09:44 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Feb 5 18:13:20 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Feb 6 02:17:25 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Feb 6 22:00:24 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Feb 9 09:29:36 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Feb 9 16:39:36 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) Il giorno dom 11 feb 2024 alle ore 14:36 Saku Ytti ha scritto: > On Sun, 11 Feb 2024 at 15:24, james list wrote: > > > While on Juniper when the issue happens I always see: > > > > show log messages | last 440 | match LACPD_TIMEOUT > > Jan 25 21:32:27.948 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: > lacp current while timer expired current Receive State: CURRENT > > > Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: > lacp current while timer expired current Receive State: CURRENT > > Ok so problem always starts by Juniper seeing 3seconds without LACP > PDU, i.e. missing 3 consecutive LACP PDU. It would be good to ping > while this problem is happening, to see if ping stops at 3s before the > syslog lines, or at the same time as syslog lines. > If ping stops 3s before, it's link problem from cisco to juniper. > If ping stops at syslog time (my guess), it's software problem. > > There is unfortunately log of bug surface here, both on inject and on > punt path. You could be hitting PR1541056 on the Juniper end. You > could test for this by removing distributed LACP handling with 'set > routing-options ppm no-delegate-processing' > You could also do packet capture for LACP on both ends, to try to see > if LACP was sent by Cisco and received by capture, but not by system. > > > -- > ++ytti > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
On Cisco I see physical goes down (initializing), what does that mean? While on Juniper when the issue happens I always see: show log messages | last 440 | match LACPD_TIMEOUT Jan 25 21:32:27.948 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT Jan 26 18:41:12.514 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT Jan 28 05:07:20.283 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT Jan 29 04:06:51.768 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT Jan 30 03:09:43.923 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT Feb 5 18:13:20.158 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT Feb 6 02:17:23.703 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT Feb 6 22:00:23.758 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT Feb 9 09:29:35.728 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT Il giorno dom 11 feb 2024 alle ore 14:10 Saku Ytti ha scritto: > Hey James, > > You shared this off-list, I think it's sufficiently material to share. > > 2024 Feb 9 16:39:36 NEXUS1 > %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface > port-channel101 is down (No operational members) > 2024 Feb 9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_DOWN: > port-channel101: Ethernet1/44 is down > Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: > lacp current while timer expired current Receive State: CURRENT > Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACP_INTF_DOWN: ae49: > Interface marked down due to lacp timeout on member et-0/1/5 > > We can't know the order of events here, due to no subsecond precision > enabled on Cisco end. > > But if failure would start from interface down, it would take 3seconds > for Juniper to realise LACP failure. However we can see that it > happens in less than 1s, so we can determine the interface was not > down first, the first problem was Juniper not receiving 3 consecutive > LACP PDUs, 1s apart, prior to noticing any type of interface state > related problems. > > Is this always the order of events? Does it always happen with Juniper > noticing problems receiving LACP PDU first? > > > On Sun, 11 Feb 2024 at 14:55, james list via juniper-nsp > wrote: > > > > Hi > > > > 1) cable has been replaced with a brand new one, they said that to check > an > > MPO 100 Gbs cable is not that easy > > > > 3) no errors reported on both side > > > > 2) here the output of cisco and juniper > > > > NEXUS1# sh interface eth1/44 transceiver details > > Ethernet1/44 > > transceiver is present > > type is QSFP-100G-SR4 > > name is CISCO-INNOLIGHT > > part number is TR-FC85S-NC3 > > revision is 2C > > serial number is INL27050TVT > > nominal bitrate is 25500 MBit/sec > > Link length supported for 50/125um OM3 fiber is 70 m > > cisco id is 17 > > cisco extended id number is 220 > > cisco part number is 10-3142-03 > > cisco product id is QSFP-100G-SR4-S > > cisco version id is V03 > > > > Lane Number:1 Network Lane > >SFP Detail Diagnostics Information (internal calibration) > > > > > > > Current Alarms Warnings > > Measurement HighLow High Low > > > > > > > Temperature 30.51 C75.00 C -5.00 C 70.00 C > 0.00 C > > Voltage3.28 V 3.63 V 2.97 V 3.46 V > 3.13 V > > Current6.40 mA 12.45 mA 3.25 mA12.45 mA > 3.25 > > mA > > Tx Power 0.98 dBm 5.39 dBm -12.44 dBm2.39 dBm > -8.41 > > dBm > > Rx Power -1.60 dBm 5.39 dBm -14.31 dBm2.39 dBm > -10.31 > > dBm > > Transmit Fault Count = 0 > > > > > > > Note: ++ hi
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
Hi 1) cable has been replaced with a brand new one, they said that to check an MPO 100 Gbs cable is not that easy 3) no errors reported on both side 2) here the output of cisco and juniper NEXUS1# sh interface eth1/44 transceiver details Ethernet1/44 transceiver is present type is QSFP-100G-SR4 name is CISCO-INNOLIGHT part number is TR-FC85S-NC3 revision is 2C serial number is INL27050TVT nominal bitrate is 25500 MBit/sec Link length supported for 50/125um OM3 fiber is 70 m cisco id is 17 cisco extended id number is 220 cisco part number is 10-3142-03 cisco product id is QSFP-100G-SR4-S cisco version id is V03 Lane Number:1 Network Lane SFP Detail Diagnostics Information (internal calibration) Current Alarms Warnings Measurement HighLow High Low Temperature 30.51 C75.00 C -5.00 C 70.00 C0.00 C Voltage3.28 V 3.63 V 2.97 V 3.46 V3.13 V Current6.40 mA 12.45 mA 3.25 mA12.45 mA 3.25 mA Tx Power 0.98 dBm 5.39 dBm -12.44 dBm2.39 dBm -8.41 dBm Rx Power -1.60 dBm 5.39 dBm -14.31 dBm2.39 dBm-10.31 dBm Transmit Fault Count = 0 Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning Lane Number:2 Network Lane SFP Detail Diagnostics Information (internal calibration) Current Alarms Warnings Measurement HighLow High Low Temperature 30.51 C75.00 C -5.00 C 70.00 C0.00 C Voltage3.28 V 3.63 V 2.97 V 3.46 V3.13 V Current6.40 mA 12.45 mA 3.25 mA12.45 mA 3.25 mA Tx Power 0.62 dBm 5.39 dBm -12.44 dBm2.39 dBm -8.41 dBm Rx Power -1.18 dBm 5.39 dBm -14.31 dBm2.39 dBm-10.31 dBm Transmit Fault Count = 0 Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning Lane Number:3 Network Lane SFP Detail Diagnostics Information (internal calibration) Current Alarms Warnings Measurement HighLow High Low Temperature 30.51 C75.00 C -5.00 C 70.00 C0.00 C Voltage3.28 V 3.63 V 2.97 V 3.46 V3.13 V Current6.40 mA 12.45 mA 3.25 mA12.45 mA 3.25 mA Tx Power 0.87 dBm 5.39 dBm -12.44 dBm2.39 dBm -8.41 dBm Rx Power 0.01 dBm 5.39 dBm -14.31 dBm2.39 dBm-10.31 dBm Transmit Fault Count = 0 Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning Lane Number:4 Network Lane SFP Detail Diagnostics Information (internal calibration) Current Alarms Warnings Measurement HighLow High Low Temperature 30.51 C75.00 C -5.00 C 70.00 C0.00 C Voltage3.28 V 3.63 V 2.97 V 3.46 V3.13 V Current6.40 mA 12.45 mA 3.25 mA12.45 mA 3.25 mA Tx Power 0.67 dBm 5.39 dBm -12.44 dBm2.39 dBm -8.41 dBm Rx Power 0.11 dBm 5.39 dBm -14.31 dBm2.39 dBm-10.31 dBm Transmit Fault Count = 0 Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning MX1> show interfaces diagnostics optics et-1/0/5 Physical interface: et-1/0/5 Module temperature: 38 degrees C / 100 degrees F Module voltage: 3.2740 V Module temperature high alarm : Off Module temperature low alarm : Off Module temperature high warning : Off Module temperature low warning: Off Module voltage high alarm
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
Hi there are no errors on both interfaces (Cisco and Juniper). here following logs of one event on both side, config and LACP stats. LOGS of one event time 16:39: CISCO 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel101 is down (No operational members) 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PARENT_DOWN: Interface port-channel101.2303 is down (Parent interface is down) 2024 Feb 9 16:39:36 NEXUS1 %BGP-5-ADJCHANGE: bgp- [xxx] (xxx) neighbor 172.16.6.17 Down - sent: other configuration change 2024 Feb 9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel101: first operational port changed from Ethernet1/44 to none 2024 Feb 9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel101: Ethernet1/44 is down 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel101,bandwidth changed to 10 Kbit 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel101 is down (No operational members) 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-SPEED: Interface port-channel101, operational speed changed to 100 Gbps 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DUPLEX: Interface port-channel101, operational duplex mode changed to Full 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface port-channel101, operational Receive Flow Control state changed to off 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface port-channel101, operational Transmit Flow Control state changed to off 2024 Feb 9 16:39:39 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_UP: port-channel101: Ethernet1/44 is up 2024 Feb 9 16:39:39 NEXUS1 %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel101: first operational port changed from none to Ethernet1/44 2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel101,bandwidth changed to 1 Kbit 2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_UP: Interface Ethernet1/44 is up in Layer3 2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_UP: Interface port-channel101 is up in Layer3 2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_UP: Interface port-channel101.2303 is up in Layer3 2024 Feb 9 16:39:43 NEXUS1 %BGP-5-ADJCHANGE: bgp- [xxx] (xxx) neighbor 172.16.6.17 Up Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACP_INTF_DOWN: ae49: Interface marked down due to lacp timeout on member et-0/1/5 Feb 9 16:39:35.819 2024 MX1 kernel: lag_bundlestate_ifd_change: bundle ae49: bundle IFD minimum bandwidth or minimum links not met, Bandwidth (Current : Required) 0 : 1000 Number of links (Current : Required) 0 : 1 Feb 9 16:39:35.815 2024 MX1 lacpd[31632]: LACP_INTF_MUX_STATE_CHANGED: ae49: et-0/1/5: Lacp state changed from COLLECTING_DISTRIBUTING to ATTACHED, actor port state : |EXP|-|-|-|IN_SYNC|AGG|SHORT|ACT|, partner port state : |-|-|DIS|COL|OUT_OF_SYNC|AGG|SHORT|ACT| Feb 9 16:39:35.869 2024 MX1 rpd[31866]: bgp_ifachange_group:10697: NOTIFICATION sent to 172.16.6.18 (External AS xxx): code 6 (Cease) subcode 6 (Other Configuration Change), Reason: Interface change for the peer-group Feb 9 16:39:35.909 2024 MX1 mib2d[31909]: SNMP_TRAP_LINK_DOWN: ifIndex 684, ifAdminStatus up(1), ifOperStatus down(2), ifName ae49 Feb 9 16:39:36.083 2024 MX1 lacpd[31632]: LACP_INTF_MUX_STATE_CHANGED: ae49: et-0/1/5: Lacp state changed from ATTACHED to COLLECTING_DISTRIBUTING, actor port state : |-|-|DIS|COL|IN_SYNC|AGG|SHORT|ACT|, partner port state : |-|-|DIS|COL|IN_SYNC|AGG|SHORT|ACT| Feb 9 16:39:36.089 2024 MX1 kernel: lag_bundlestate_ifd_change: bundle ae49 is now Up. uplinks 1 >= min_links 1 Feb 9 16:39:36.089 2024 MX1 kernel: lag_bundlestate_ifd_change: bundle ae49: bundle IFD minimum bandwidth or minimum links not met, Bandwidth (Current : Required) 0 : 1000 Number of links (Current : Required) 0 : 1 Feb 9 16:39:36.085 2024 MX1 lacpd[31632]: LACP_INTF_MUX_STATE_CHANGED: ae49: et-0/1/5: Lacp state changed from COLLECTING_DISTRIBUTING to ATTACHED, actor port state : |-|-|-|-|IN_SYNC|AGG|SHORT|ACT|, partner port state : |-|-|-|-|OUT_OF_SYNC|AGG|SHORT|ACT| Feb 9 16:39:39.095 2024 MX1 lacpd[31632]: LACP_INTF_MUX_STATE_CHANGED: ae49: et-0/1/5: Lacp state changed from ATTACHED to COLLECTING_DISTRIBUTING, actor port state : |-|-|DIS|COL|IN_SYNC|AGG|SHORT|ACT|, partner port state : |-|-|-|-|IN_SYNC|AGG|SHORT|ACT| Feb 9 16:39:39.101 2024 MX1 kernel: lag_bundlestate_ifd_change: bundle ae49 is now Up. uplinks 1 >= min_links 1 Feb 9 16:39:39.109 2024 MX1 mib2d[31909]: SNMP_TRAP_LINK_UP: ifIndex 684, ifAdminStatus up(1), ifOperStatus up(1), ifName ae49 Feb 9 16:39:41.190 2024 MX1 rpd[31866]: bgp_recv: read from peer 172.16.6.18 (External AS xxx) failed: Unknown error: 48110976 CONFIG: CISCO NEXUS1# sh run int por
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
DC technicians states cable are the same in both DCs and direct, no patch panel Cheers Il giorno dom 11 feb 2024 alle ore 11:20 nivalMcNd d ha scritto: > Can it be DC1 is connecting links over an intermediary patch panel and you > face fibre disturbance? That may be eliminated if your interfaces on DC1 > links do not go down > > On Sun, Feb 11, 2024, 21:16 Igor Sukhomlinov via cisco-nsp < > cisco-...@puck.nether.net> wrote: > >> Hi James, >> >> Do you happen to run the same software on all nexuses and all MXes? >> Do the DC1 and DC2 bgp session exchange the same amount of routing updates >> across the links? >> >> >> On Sun, Feb 11, 2024, 21:09 james list via cisco-nsp < >> cisco-...@puck.nether.net> wrote: >> >> > Dear experts >> > we have a couple of BGP peers over a 100 Gbs interconnection between >> > Juniper (MX10003) and Cisco (Nexus N9K-C9364C) in two different >> datacenters >> > like this: >> > >> > DC1 >> > MX1 -- bgp -- NEXUS1 >> > MX2 -- bgp -- NEXUS2 >> > >> > DC2 >> > MX3 -- bgp -- NEXUS3 >> > MX4 -- bgp -- NEXUS4 >> > >> > The issue we see is that sporadically (ie every 1 to 3 days) we notice >> BGP >> > flaps only in DC1 on both interconnections (not at the same time), >> there is >> > still no traffic since once noticed the flaps we have blocked deploy on >> > production. >> > >> > We've already changed SPF (we moved the ones from DC2 to DC1 and >> viceversa) >> > and cables on both the interconnetion at DC1 without any solution. >> > >> > SFP we use in both DCs: >> > >> > Juniper - QSFP-100G-SR4-T2 >> > Cisco - QSFP-100G-SR4 >> > >> > over MPO cable OM4. >> > >> > Distance is DC1 70 mt and DC2 80 mt, hence is less where we see the >> issue. >> > >> > Any idea or suggestion what to check or to do ? >> > >> > Thanks in advance >> > Cheers >> > James >> > ___ >> > cisco-nsp mailing list cisco-...@puck.nether.net >> > https://puck.nether.net/mailman/listinfo/cisco-nsp >> > archive at http://puck.nether.net/pipermail/cisco-nsp/ >> > >> ___ >> cisco-nsp mailing list cisco-...@puck.nether.net >> https://puck.nether.net/mailman/listinfo/cisco-nsp >> archive at http://puck.nether.net/pipermail/cisco-nsp/ >> > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
yes same version currently no traffic exchange is in place, just BGP peer setup no traffic Il giorno dom 11 feb 2024 alle ore 11:16 Igor Sukhomlinov < dvalinsw...@gmail.com> ha scritto: > Hi James, > > Do you happen to run the same software on all nexuses and all MXes? > Do the DC1 and DC2 bgp session exchange the same amount of routing updates > across the links? > > > On Sun, Feb 11, 2024, 21:09 james list via cisco-nsp < > cisco-...@puck.nether.net> wrote: > >> Dear experts >> we have a couple of BGP peers over a 100 Gbs interconnection between >> Juniper (MX10003) and Cisco (Nexus N9K-C9364C) in two different >> datacenters >> like this: >> >> DC1 >> MX1 -- bgp -- NEXUS1 >> MX2 -- bgp -- NEXUS2 >> >> DC2 >> MX3 -- bgp -- NEXUS3 >> MX4 -- bgp -- NEXUS4 >> >> The issue we see is that sporadically (ie every 1 to 3 days) we notice BGP >> flaps only in DC1 on both interconnections (not at the same time), there >> is >> still no traffic since once noticed the flaps we have blocked deploy on >> production. >> >> We've already changed SPF (we moved the ones from DC2 to DC1 and >> viceversa) >> and cables on both the interconnetion at DC1 without any solution. >> >> SFP we use in both DCs: >> >> Juniper - QSFP-100G-SR4-T2 >> Cisco - QSFP-100G-SR4 >> >> over MPO cable OM4. >> >> Distance is DC1 70 mt and DC2 80 mt, hence is less where we see the issue. >> >> Any idea or suggestion what to check or to do ? >> >> Thanks in advance >> Cheers >> James >> ___ >> cisco-nsp mailing list cisco-...@puck.nether.net >> https://puck.nether.net/mailman/listinfo/cisco-nsp >> archive at http://puck.nether.net/pipermail/cisco-nsp/ >> > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
Hi One think I've omit to say is that BGP is over a LACP with currently just one interface 100 Gbs. I see that the issue is triggered on Cisco when eth interface seems to go in Initializing state: 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel101 is down (No operational members) 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PARENT_DOWN: Interface port-channel101.2303 is down (Parent interface is down) 2024 Feb 9 16:39:36 NEXUS1 %BGP-5-ADJCHANGE: bgp- [xxx] (xxx) neighbor 172.16.6.17 Down - sent: other configuration change 2024 Feb 9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel101: first operational port changed from Ethernet1/44 to none 2024 Feb 9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel101: Ethernet1/44 is down 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel101,bandwidth changed to 10 Kbit 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/44 is down (Initializing) 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel101 is down (No operational members) 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-SPEED: Interface port-channel101, operational speed changed to 100 Gbps 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DUPLEX: Interface port-channel101, operational duplex mode changed to Full 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface port-channel101, operational Receive Flow Control state changed to off 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface port-channel101, operational Transmit Flow Control state changed to off 2024 Feb 9 16:39:39 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_UP: port-channel101: Ethernet1/44 is up 2024 Feb 9 16:39:39 NEXUS1 %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel101: first operational port changed from none to Ethernet1/44 2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel101,bandwidth changed to 1 Kbit 2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_UP: Interface Ethernet1/44 is up in Layer3 2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_UP: Interface port-channel101 is up in Layer3 2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_UP: Interface port-channel101.2303 is up in Layer3 2024 Feb 9 16:39:43 NEXUS1 %BGP-5-ADJCHANGE: bgp- [xxx] (xxx) neighbor 172.16.6.17 Up Cheers James Il giorno dom 11 feb 2024 alle ore 11:12 Gert Doering ha scritto: > Hi, > > On Sun, Feb 11, 2024 at 11:08:29AM +0100, james list via cisco-nsp wrote: > > we notice BGP flaps > > Any particular error message? BGP flaps can happen due to many different > reasons, and usually $C is fairly good at logging the reason. > > Any interface errors, packet errors, ping packets lost? > > "BGP flaps" *can* be related to lower layer issues (so: interface counters, > error counters, extended pings) or to something unrelated, like "MaxPfx > exceeded"... > > gert > -- > "If was one thing all people took for granted, was conviction that if you > feed honest figures into a computer, honest figures come out. Never > doubted > it myself till I met a computer with a sense of humor." > Robert A. Heinlein, The Moon is a Harsh > Mistress > > Gert Doering - Munich, Germany > g...@greenie.muc.de > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] Fwd: Stange issue on 100 Gbs interconnection Juniper - Cisco
Dear experts we have a couple of BGP peers over a 100 Gbs interconnection between Juniper (MX10003) and Cisco (Nexus N9K-C9364C) in two different datacenters like this: DC1 MX1 -- bgp -- NEXUS1 MX2 -- bgp -- NEXUS2 DC2 MX3 -- bgp -- NEXUS3 MX4 -- bgp -- NEXUS4 The issue we see is that sporadically (ie every 1 to 3 days) we notice BGP flaps only in DC1 on both interconnections (not at the same time), there is still no traffic since once noticed the flaps we have blocked deploy on production. We've already changed SPF (we moved the ones from DC2 to DC1 and viceversa) and cables on both the interconnetion at DC1 without any solution. SFP we use in both DCs: Juniper - QSFP-100G-SR4-T2 Cisco - QSFP-100G-SR4 over MPO cable OM4. Distance is DC1 70 mt and DC2 80 mt, hence is less where we see the issue. Any idea or suggestion what to check or to do ? Thanks in advance Cheers James ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
Dear experts we have a couple of BGP peers over a 100 Gbs interconnection between Juniper (MX10003) and Cisco (Nexus N9K-C9364C) in two different datacenters like this: DC1 MX1 -- bgp -- NEXUS1 MX2 -- bgp -- NEXUS2 DC2 MX3 -- bgp -- NEXUS3 MX4 -- bgp -- NEXUS4 The issue we see is that sporadically (ie every 1 to 3 days) we notice BGP flaps only in DC1 on both interconnections (not at the same time), there is still no traffic since once noticed the flaps we have blocked deploy on production. We've already changed SPF (we moved the ones from DC2 to DC1 and viceversa) and cables on both the interconnetion at DC1 without any solution. SFP we use in both DCs: Juniper - QSFP-100G-SR4-T2 Cisco - QSFP-100G-SR4 over MPO cable OM4. Distance is DC1 70 mt and DC2 80 mt, hence is less where we see the issue. Any idea or suggestion what to check or to do ? Thanks in advance Cheers James ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] input errors on QFX5110
Dear experts, a customer of mine has a Fiber Channel over IP switch connected to a port QFX5110 port and sees a lot of "Input errors" and "oversize frames" Input errors: Errors: 118467 Oversized frames118467 below extensive output Are those counters related to frames dropped or just a counter ? The same machine connected to a previous QFX5100 (different Junos) does not highlight the same counter. Since the FCoIP switch manager sees packet loss and retransmissions and states no jumbo frame should be needed, which is your idea of the issue ? I see as well this: Autonegotiation information: Negotiation status: Incomplete Thanks in advance James QFX5110A> show interfaces ge-0/0/0 extensive Physical interface: ge-0/0/0, Enabled, Physical link is Up Interface index: 652, SNMP ifIndex: 707, Generation: 145 Description: FCOIP Link-level type: Ethernet, MTU: 1514, LAN-PHY mode, Speed: 1000mbps, BPDU Error: None, Loop Detect PDU Error: None, Ethernet-Switching Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled, Flow control: Disabled, Auto-negotiation: Enabled, Remote fault: Online, Media type: Fiber, IEEE 802.3az Energy Efficient Ethernet: Disabled, Auto-MDIX: Enabled Device flags : Present Running Interface flags: SNMP-Traps Internal: 0x4000 Link flags : None CoS queues : 12 supported, 12 maximum usable queues Hold-times : Up 0 ms, Down 0 ms Current address: bc:7c:6c:23:11:03, Hardware address: bc:7c:6c:23:11:03 Last flapped : 2023-03-23 10:54:13 CET (19w5d 00:48 ago) Statistics last cleared: 2023-08-08 10:56:52 CEST (01:45:26 ago) Traffic statistics: Input bytes : 40792860463 9378952 bps Output bytes : 4232537714 4886680 bps Input packets: 52096906 4710 pps Output packets: 29620532 3841 pps IPv6 transit statistics: Input bytes :0 Output bytes :0 Input packets:0 Output packets:0 Input errors: Errors: 118467, Drops: 0, Framing errors: 0, Runts: 0, Policed discards: 0, L3 incompletes: 0, L2 channel errors: 0, L2 mismatch timeouts: 0, FIFO errors: 0, Resource errors: 0 Output errors: Carrier transitions: 0, Errors: 0, Drops: 0, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0 Egress queues: 12 supported, 5 in use Queue counters: Queued packets Transmitted packets Dropped packets 0 29604377 29604377 0 300 0 400 0 7 6835 6835 0 8 5865 5865 0 Queue number: Mapped forwarding classes 0 best-effort 3 fcoe 4 no-loss 7 network-control 8 mcast Active alarms : None Active defects : None PCS statistics Seconds Bit errors 0 Errored blocks 0 Ethernet FEC statistics Errors FEC Corrected Errors0 FEC Uncorrected Errors 0 FEC Corrected Errors Rate 0 FEC Uncorrected Errors Rate 0 MAC statistics: Receive Transmit Total octets 40792860463 4232537714 Total packets 52096906 29620532 Unicast packets 52096695 29607881 Broadcast packets0 1452 Multicast packets 21111199 CRC/Align errors 00 FIFO errors 00 MAC control frames 00 MAC pause frames 00 Oversized frames118467 Jabber frames0 Fragment frames 0 VLAN tagged frames 0 Code violations 0 MAC Priority Flow Control Statistics: Priority : 0 00 Priority : 1 00 Priority : 2 00 Priority : 3 00 Priority : 4 00 Priority : 5 00 Priority : 6 00 Priority : 7
[j-nsp] Fwd: Port-channel not working Juniper vs Cisco
Dear experts we've an issue in setting up a port-channel between a Juniper EX4400 and a Cisco Nexus N9K-C93180YC-EX over an SX 1 Gbs link. We've implemented the following configuration but on Juniper side it is interface flapping while on Cisco side it remains down. Light levels seem ok. Has anyone ever experienced the same ? Any suggestions ? Thanks in advance for any hint Kind regards James JUNIPER * > show configuration interfaces ae10 | display set set interfaces ae10 description "to Cisco leaf" set interfaces ae10 aggregated-ether-options lacp active set interfaces ae10 aggregated-ether-options lacp periodic fast set interfaces ae10 unit 0 family ethernet-switching interface-mode trunk set interfaces ae10 unit 0 family ethernet-switching vlan members 301 > show configuration interfaces ge-0/2/3 | display set set interfaces ge-0/2/3 description "to Cisco leaf" set interfaces ge-0/2/3 ether-options 802.3ad ae10 > show vlans VLAN_301 Routing instanceVLAN name Tag Interfaces default-switch VLAN_301 301 ae10.0 CISCO *** interface Ethernet1/41 description <[To EX4400]> switchport switchport mode trunk switchport trunk allowed vlan 301 channel-group 41 mode active no shutdown interface port-channel41 description <[To EX4400]> switchport switchport mode trunk switchport trunk allowed vlan 301 # sh vlan id 301 VLAN Name StatusPorts - --- 301 P2P_xxx activePo1, Po41, Eth1/1, Eth1/41 VLAN Type Vlan-mode --- 301 enet CE Remote SPAN VLAN Disabled Primary Secondary Type Ports --- - --- --- ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
Can you please share the output of: show class-of-service shared-buffer on your QFX5100 ? Cheers James Il giorno ven 19 nov 2021 alle ore 11:58 Thomas Bellman ha scritto: > On 2021-11-19 09:49, james list via juniper-nsp wrote: > > > I try to rephrase the question you do not understand: if I enable cut > > through or change buffer is it traffic affecting ? > > On the QFX 5xxx series and (at least) EX 46xx series, the forwarding > ASIC needs to reset in order to change between store-and-forward and > cut-through, and traffic will be lost until the reprogramming has been > completed. Likewise, changing buffer config will need to reset the > ASIC. When I have tested it, this has taken at most one second, though, > so for many people it will be a non-event. > > One thing to remember when using cut-through forwarding, is that packets > that have suffered bit errors or truncation, so the CRC checksum is > incorrect, will still be forwarded, and not be discarded by the switch. > This is usually not a problem in itself, but if you are not aware of it, > it is easy to get confused when troubleshooting bit errors (you see > ingress errors on one switch, and think it is the link to the switch > that has problems, but in reality it might just be that the switch on > the other end that is forwarding broken packets *it* received). > > > > Regarding the drops here the outputs (15h after clear statistics): > [...abbreviated...] > > Queue: 0, Forwarding classes: best-effort > > Transmitted: > > Packets :6929684309190446 pps > > Bytes: 4259968408584 761960360 bps > > Total-dropped packets: 1592 0 pps > > Total-dropped bytes : 2244862 0 bps > [...]> Queue: 7, Forwarding classes: network-control > > Transmitted: > > Packets : 59234 0 pps > > Bytes: 4532824 504 bps > > Total-dropped packets: 0 0 pps > > Total-dropped bytes : 0 0 bps > > Queue: 8, Forwarding classes: mcast > > Transmitted: > > Packets : 655370488 pps > > Bytes:5102847425663112 bps > > Total-dropped packets: 279 0 pps > > Total-dropped bytes :423522 0 bps > > These drop figures don't immediately strike me as excessive. We > certainly have much higher drop percentages, and don't see much > practical performance problems. But it will very much depend on > your application. The one thing I note is that you have much > more multicast than we do, and you see drops in that forwarding > class. > > I didn't quite understand if you see actual application or > performance problems. > > > > show class-of-service shared-buffer > > Ingress: > > Total Buffer : 12480.00 KB > > Dedicated Buffer : 2912.81 KB > > Shared Buffer: 9567.19 KB > > Lossless : 861.05 KB > > Lossless Headroom : 4305.23 KB > > Lossy : 4400.91 KB > > This looks like a QFX5100 or EX4600, with the 12 Mbyte buffer in the > Broadcom Trident 2 chip. You probably want to read this page, to > understand how to configure buffer allocation for your needs: > > > https://www.juniper.net/documentation/us/en/software/junos/traffic-mgmt-qfx/topics/concept/cos-qfx-series-buffer-configuration-understanding.html > > In my network, we only have best-effort traffic, and very little > multi- or broadcast traffic (basically just ARP/Neighbour discovery, > DHCP, and OSPF), so we use these settings on our QFX5100 and EX4600 > switches: > > forwarding-options { > cut-through; > } > class-of-service { > /* Max buffers to best-effort traffic, minimum for lossless > ethernet */ > shared-buffer { > ingress { > percent 100; > buffer-partition lossless { percent 5; } > buffer-partition lossless-headroom { percent 0; } > buffer-partition lossy { percent 95; } > } > egress { > percent 100; > buffer-partition lossless { percent 5; } > buffer-partition lossy { percent 75; } > buffer-partition multicast { percent 20; } > } > } > } > > (On our QFX5120 switches, I have moved even more buffer space to > the "lossy" classes.) But you need to tune to *your* needs; the > above is for our needs. > > > /Bellman > > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
Hi I mentioned MX and QFX (output related QFX5100) in first email because traffic pattern spread both. I never mentioned internet. I also understood cut through cannot help but obviously I cannot change QFX switches because we loss few udp packets for a single application, the idea could be to change shared buffers for unused queues and add to used one, correct ? Based on the output provided what you suggest to change ? I also understand this kind of change is traffic affecting. I also need to understand how shared buffer queues on QFX are attached to COS queues. Thanks, cheers James Il giorno ven 19 nov 2021 alle ore 10:07 Saku Ytti ha scritto: > On Fri, 19 Nov 2021 at 10:49, james list wrote: > > Hey, > > > I try to rephrase the question you do not understand: if I enable cut > through or change buffer is it traffic affecting ? > > There is no cut-through and I was hoping after reading the previous > email, you'd understand why it won't help you at all nor is it > desirable. Changing QoS config may be traffic affecting, but you > likely do not have the monitoring capability to observe it. > > > Regarding the drops here the outputs (15h after clear statistics): > > You talked about MX, so I answered from MX perspective. But your > output is not from MX. > > The device you actually show has exceedingly tiny buffers and is not > meant for Internet WAN use, that is, it does not expect significantly > higher sender rate to receiver rate with high RTT. It is meant for > datacenter use, where RTT is low and speed delta is small. > > In real life Internet you need larger buffers because of this > senderPC => internets => receiverPC > > Let's imagine an RTT of 200ms and receiver 10GE and sender 100GE. > - 10Gbps * 200ms = 250MB TCP window needed to fill it > - as TCP windows grow exponentially in absence of loss, you could have > 128MB => 250MB growth > - this means, senderPC might serialise 128MB of data at 100Gbps > - this 128MB you can only send at 10 Gbps rate, rest you have to take > into the buffers > - intentionally pathological example > - 'easy' fix is, that sender doesn't burst the data at its own rate, > but does rate estimation and sends window growth at estimated receiver > rate, this practically removes buffering needs entirely > - 'easy' fix is not standard behaviour, but some cloudyshops configure > their linux like this thankfully (Linux already does bandwidth > estimation, and you can ask 'tc' to shape the session to esimated > bandwidth' > > What you need to do is change the device to a one that is intended for > the application you have. > If you can do anything at all, what you can do, is ensure that you > have minimum amount of QoS classes and those QoS classes have maximum > amount of buffer. So that unused queues aren't holding empty memory > while used queue is starving. But even this will have only marginal > benefit. > > Cut-through does nothing, because your egress is congested, you can > only use cut-through if egress is not congested. > > > > -- > ++ytti > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Cut through and buffer questions
: 0 0 bps Transmitted: Packets : 59234 0 pps Bytes: 4532824 504 bps Tail-dropped packets : Not Available RL-dropped packets : 0 0 pps RL-dropped bytes : 0 0 bps Total-dropped packets: 0 0 pps Total-dropped bytes : 0 0 bps Queue: 8, Forwarding classes: mcast Queued: Packets : 0 0 pps Bytes: 0 0 bps Transmitted: Packets : 655370488 pps Bytes:5102847425663112 bps Tail-dropped packets : Not Available RL-dropped packets : 0 0 pps RL-dropped bytes : 0 0 bps Total-dropped packets: 279 0 pps Total-dropped bytes :423522 0 bps {master:0} show class-of-service shared-buffer Ingress: Total Buffer : 12480.00 KB Dedicated Buffer : 2912.81 KB Shared Buffer: 9567.19 KB Lossless : 861.05 KB Lossless Headroom : 4305.23 KB Lossy : 4400.91 KB Lossless Headroom Utilization: Node Device Total Used Free 0 4305.23 KB 0.00 KB 4305.23 KB 1 4305.23 KB 0.00 KB 4305.23 KB 2 4305.23 KB 0.00 KB 4305.23 KB 3 4305.23 KB 0.00 KB 4305.23 KB 4 4305.23 KB 0.00 KB 4305.23 KB Egress: Total Buffer : 12480.00 KB Dedicated Buffer : 3744.00 KB Shared Buffer: 8736.00 KB Lossless : 4368.00 KB Multicast : 1659.84 KB Lossy : 2708.16 KB Cheers James Il giorno ven 19 nov 2021 alle ore 08:36 Saku Ytti ha scritto: > On Thu, 18 Nov 2021 at 23:20, james list via juniper-nsp > wrote: > > > 1) is MX family switching by default in cut through or store and forward > > mode? I was not able to find a clear information > > Store and forward. > > > 2) is in general (on MX or QFX) jeopardizing the traffic the action to > > enable cut through or change buffer allocation? > > I don't understand the question. > > > I have some output discard on an interface (class best effort) and some > UDP > > packets are lost hence I am tuning to find a solution. > > I don't think how this relates to cut-through at all. > > Cut-through works when ingress can start writing frame to egress while > still reading it, this is ~never the case in multistage ingress+egress > buffered devices. And even in devices where it is the case, it only > works if egress interface happens to be not serialising the packet at > that time, so the percentage of frames actually getting cut-through > behaviour in cut-through devices is low in typical applications, > applications where it is high likely could have been replaced by a > direct connection. > Modern multistage devices have low single digit microseconds internal > latency and nanoseconds jitter. One microsecond is about 200m in > fiber, so that gives you the scale of how much distance you can reduce > by reducing the delay incurred by multistage device. > > Now having said that, what actually is the problem. What are 'output > discards', which counter are you looking at? Have you modified QoS > configuration, can you share it? By default JNPR is 95% BE, 5% NC > (unlike Cisco, which is 100% BE, which I think is better default), and > buffer allocation is same, so if you are actually QoS tail-dropping in > default JNPR configuration, you're creating massive delays, because > the buffer allocation us huge and your problem is rather simply that > you're offering too much to the egress, and best you can do is reduce > buffer allocation to have lower collateral damage. > > -- > ++ytti > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] Cut through and buffer questions
Hi all Questions: 1) is MX family switching by default in cut through or store and forward mode? I was not able to find a clear information 2) is in general (on MX or QFX) jeopardizing the traffic the action to enable cut through or change buffer allocation? I have some output discard on an interface (class best effort) and some UDP packets are lost hence I am tuning to find a solution. Thanks in advance for any hint Cheers James ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] [c-nsp] strange issue
Hi I've to ask for the VM routing table and then I will share. VM gateway is load balancer. Cheers James Il giorno gio 29 lug 2021 alle ore 18:17 Ryan Rawdon ha scritto: > > > On Jul 29, 2021, at 11:55 AM, james list wrote: > > > > > > Internet - Firewall – Lan - Load balancer – Lan – hypervisor- VM > > > > > > > > It happens sometime that the VM do not respond anymore to Load balancer > for > > external ip addresses until on the Load balancer it is setted to source > NAT > > (SNAT) the internet traffic and then SNAT it’s removed. > > > > Can you share the routing table of the VM in question? Specifically/most > importantly - Is the load balancer being used as the VM’s default gateway, > or does the VM use the firewall as its default gateway? In the latter > case, I would expect the load balancer to SNAT traffic or act as a full > layer 7 proxy where a new TCP connection is established from the load > balancer to the upstream servers. > > With a misconfiguration or misaligned design intention here, I could see > the intended behavior depending on ARP or firewall/connection state > tracking behavior in the devices. > > > > Something like an action that solicit the VM to refresh the arp. > > > > > > > > While health check from Loadbalancer to VM in the same LAN subnet never > > stops to work. > > > > > > > > Does anybody ever encountered the same problem on VM environments ? > > In the absence of evidence otherwise, I suspect your issue is not > VM-specific. Do you have examples of physical hosts in the same LAN that > do not exhibit this problem? If so, has the routing table (default gateway > and possibly other persistent static routes) been compared? > > > > > Any idea ? > > > > > > > > Thanks in advance > > > > James > > ___ > > cisco-nsp mailing list cisco-...@puck.nether.net > > https://puck.nether.net/mailman/listinfo/cisco-nsp > > archive at http://puck.nether.net/pipermail/cisco-nsp/ > > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] strange issue
Dear experts My customer has the following very simple infrastructure: Internet - Firewall – Lan - Load balancer – Lan – hypervisor- VM It happens sometime that the VM do not respond anymore to Load balancer for external ip addresses until on the Load balancer it is setted to source NAT (SNAT) the internet traffic and then SNAT it’s removed. Something like an action that solicit the VM to refresh the arp. While health check from Loadbalancer to VM in the same LAN subnet never stops to work. Does anybody ever encountered the same problem on VM environments ? Any idea ? Thanks in advance James ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp