Claudio Jeker wrote:
On Wed, Oct 05, 2005 at 06:33:05PM -0400, Daniel Ouellet wrote:
================

Now with MD5 configure. We only add

tcp md5sig password test on bgpd side and
neighbor 66.63.12.108 password test on the Cisco side.

With bgpd master
Clear session from bgpd side, session comes back up right away.
Clear session from remote side, session comes back up with possible very long delay.

With bgpd slave
Just can't establish a session what so ever! The Cisco side will get stuck in the OpenSent mode and cycle a few times all without success.

66.63.12.108    4 65001       0       1        0    0    0 never    OpenSent



I can't reproduce this. On my test setup all session come back up.

Configuration with MD5.

Well, let see if this help or not. Two example below. One might not be very elegant, but I think it may well show the problem. I force the bgpd to try to be slave using some filter on the Cisco router. The filter WILL be temporary in my case anyway as I want the session to be stuck in OpenSent mode and then at that time I will remove the filter an sit back and watch. So, what happen is that the session will never come up, I think it should anyway, but it doesn't.

Then when I see on the Cisco router OpenSent, I will simply remove the filter to be 100% sure nothing is blocking the regular traffic and see if the session can recover. It doesn't.

So, I use this filter to force this stage on the Interface facing the bgpd.

ip access-list extended bgpd-slave
 permit tcp any eq bgp any neq bgp
 deny   tcp any neq bgp any eq bgp
 permit ip any any

and apply it like this

interface FastEthernet0/0
 description Connection to OpenBSD Test Lab
 ip address 66.63.12.107 255.255.255.192
 ip access-group bgpd-slave in

I save my config and to be ultra sure nothing else interfere, I simply reload. No need to do that and it is stupid anyway, but just to be paranoid here I do that.

After I can ping the Cisco for a few seconds, I initiate my bgpd on both version of OpenBSD and then when I see the OpenSent stage on the Cisco router, because even if it should establish a slave connection with this filter, it doesn't. Why, I wish I knew, but anyway it doesn't. Then when in OpenSent mode, I remove the filter for the interface totally to be sure nothing is in the way. Also, remember no pf is running as well and the two server are fresh install with nothing on them other then they install and then configuring the bgpd. That's it.


So, when I see:

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
66.63.12.106    4 65001       0       1        0    0    0 never    OpenSent
66.63.12.108    4 65001       0       1        0    0    0 never    OpenSent

I do

no ip access-group bgpd-slave in

on my fast Ethernet interface and the sit back. Nothing will ever happen here. No session will ever get up. Never! It will cycle in close -> idle -> active -> OpenSent and then stay there for a few minutes and then cycle again to the same point and do that over and over again.

What I see on the OpenBSD on 3.7 is

# bgpctl s neigh 66.63.12.107
BGP neighbor is 66.63.12.107, remote AS 65001
 Description: iBGP Test
  BGP version 4, remote router-id 0.0.0.0
  BGP state = Active
  Last read Never, holdtime 240s, keepalive interval 80s

  Message statistics:
                  Sent       Received
  Opens                    1          0
  Notifications            0          0
  Updates                  0          0
  Keepalives               0          0
  Route Refresh            0          0
  Total                    1          0

  Local host:          66.63.12.106, Local port:    179
  Remote host:         66.63.12.107, Remote port: 14670

==========================

and at each cycle of close -> idle -> active -> OpenSent, the port above will changed and in current, after the first cycle, it will show

Last error: unknown error code

instead and no ports informations and error logs like this:

Oct  7 05:44:42 dev2 bgpd[21803]: startup
Oct  7 05:44:42 dev2 bgpd[14625]: route decision engine ready
Oct  7 05:44:42 dev2 bgpd[16756]: listening on 66.63.12.106
Oct  7 05:44:42 dev2 bgpd[16756]: session engine ready
Oct 7 05:44:42 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): state change None -> Idle, reason: None Oct 7 05:44:42 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): state change Idle -> Connect, reason: Start Oct 7 05:44:42 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): state change Connect -> OpenSent, reason: Connection open
ed
Oct 7 05:44:42 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): write error: Invalid argument Oct 7 05:44:42 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): state change OpenSent -> Idle, reason: Fatal error
Oct  7 05:44:49 dev2 ntpd[24590]: adjusting local clock by -170.192293s
Oct 7 05:45:12 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): state change Idle -> Connect, reason: Start Oct 7 05:46:26 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): socket error: No route to host Oct 7 05:46:26 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): state change Connect -> Active, reason: Connection open f
ailed
Oct 7 05:48:16 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): state change Active -> OpenSent, reason: Connection opene
d
Oct 7 05:48:16 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): write error: Invalid argument Oct 7 05:48:16 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): state change OpenSent -> Idle, reason: Fatal error
Oct  7 05:48:34 dev2 ntpd[24590]: adjusting local clock by -169.939425s
Oct 7 05:49:16 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): state change Idle -> Connect, reason: Start Oct 7 05:49:16 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): socket error: Connection refused Oct 7 05:49:16 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test): state change Connect -> Active, reason: Connection open f
ailed


-------------------

Current is no better but as noted about, the ports information after the first cycle will be replace with:

Last error: unknown error code


# bgpctl s neigh 66.63.12.107
BGP neighbor is 66.63.12.107, remote AS 65001
 Description: iBGP Test
  BGP version 4, remote router-id 0.0.0.0
  BGP state = Active
  Last read Never, holdtime 240s, keepalive interval 80s

  Message statistics:
                  Sent       Received
  Opens                    2          0
  Notifications            0          0
  Updates                  0          0
  Keepalives               0          0
  Route Refresh            0          0
  Total                    2          0

  Local host:          66.63.12.108, Local port:    179
  Remote host:         66.63.12.107, Remote port: 13386

With error log:

Oct  7 05:41:55 dev1 bgpd[15395]: startup
Oct  7 05:41:55 dev1 bgpd[16398]: route decision engine ready
Oct  7 05:41:55 dev1 bgpd[10475]: listening on 66.63.12.108
Oct  7 05:41:55 dev1 bgpd[10475]: session engine ready
Oct 7 05:41:55 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): state change None -> Idle, reason: None Oct 7 05:41:55 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): state change Idle -> Connect, reason: Start Oct 7 05:41:55 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): state change Connect -> OpenSent, reason: Connection open
ed
Oct 7 05:41:55 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): write error: Invalid argument Oct 7 05:41:55 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): state change OpenSent -> Idle, reason: Fatal error Oct 7 05:42:25 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): state change Idle -> Connect, reason: Start Oct 7 05:43:40 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): socket error: No route to host Oct 7 05:43:40 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): state change Connect -> Active, reason: Connection open f
ailed
Oct 7 05:45:31 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): state change Active -> OpenSent, reason: Connection opene
d
Oct 7 05:45:31 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): write error: Invalid argument Oct 7 05:45:31 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): state change OpenSent -> Idle, reason: Fatal error Oct 7 05:46:31 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): state change Idle -> Connect, reason: Start Oct 7 05:46:31 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): socket error: Connection refused Oct 7 05:46:31 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test): state change Connect -> Active, reason: Connection open f
ailed


=====================
Second example.

Now, I make sure no filter are present on the Cisco, reload it and kill the bgpd on both server and restart them. What happen then is I sure can establish a session where looks like bgpd will always be the master and then after it is establish, if I reset from the Cisco side, it will never come back to life. It will get stuck on OpenSent mode again here too.

I setup two boxes, one with 3.7 and one with current (oct 6) to see any difference for this specific event. Same results so far when MD5 is configure on it. Same results with Cisco 5350 and 7206. Same thing with IOS 12.3(9), 12.3(16) or 12.4(3) as well. Obviously, I didn't try every version under the sun, but the idea is there anyway.

I establish a session with MD5 where bgpd is initiate the session to the Cisco box. The "bgpctl show neighbor 66.63.12.107" clearly show that bgpd connect to the remote on 179. After the session is up, if I do "clear ip bgp 66.63.12.106" or "clear ip bgp 66.63.12.108", both will get stuck for ever until I manually clear the session as well from the bgpd side. So, if ONLY the Cisco side initial a session clear, well gone it will be until a manual clear is also done on bgpd side. I do see the session on Cisco do the close, idle, active, OpenSent and then get stuck there. Really looks like the bgpd side simply is not listening anymore.

Only difference is that on current, you get the port clear looks like and an error message that 3.7 doesn't provide.

current:
# bgpctl s neigh 66.63.12.107
BGP neighbor is 66.63.12.107, remote AS 65001
 Description: iBGP Test
  BGP version 4, remote router-id 66.63.12.107
  BGP state = Idle, down for 00:16:06
  Last read 00:16:14, holdtime 240s, keepalive interval 80s

  Message statistics:
                  Sent       Received
  Opens                   17          4
  Notifications            2          0
  Updates                  4          4
  Keepalives              34         41
  Route Refresh            0          0
  Total                   57         49

  Last error: unknown error code

--------------
as oppose to 3.7 you get this:

# bgpctl s neigh 66.63.12.107
BGP neighbor is 66.63.12.107, remote AS 65001
 Description: iBGP Test
  BGP version 4, remote router-id 66.63.12.107
  BGP state = Active, down for 00:16:17
  Last read 00:16:18, holdtime 240s, keepalive interval 80s

  Message statistics:
                  Sent       Received
  Opens                    8          4
  Notifications            2          0
  Updates                  4          4
  Keepalives              34         42
  Route Refresh            0          0
  Total                   48         50

  Local host:          66.63.12.108, Local port:  14223
  Remote host:         66.63.12.107, Remote port:   179

------------------------
and from the router side:

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
66.63.12.106    4 65001      44      73        0    0    0 00:16:23 OpenSent
66.63.12.108    4 65001      44      74        0    0    0 00:16:22 OpenSent

====================

No matter how long I wait, it's stuck there for ever.

Now as far as netstat -sptcp is concern, here is the results fro current.

# netstat -sptcp
tcp:
        6392 packets sent
                3446 data packets (410744 bytes)
                75 data packets (14780 bytes) retransmitted
                0 fast retransmitted packets
                2774 ack-only packets (3298 delayed)
                0 URG only packets
                0 window probe packets
                7 window update packets
                90 control packets
                0 packets hardware-checksummed
        7564 packets received
                2808 acks (for 391996 bytes)
                783 duplicate acks
                0 acks for unsent data
                0 acks for old data
                4699 packets (347442 bytes) received in-sequence
                435 completely duplicate packets (18997 bytes)
                0 old duplicate packets
                0 packets with some duplicate data (0 bytes duplicated)
                47 out-of-order packets (1404 bytes)
                0 packets (0 bytes) of data after window
                0 window probes
                8 window update packets
                37 packets received after close
                0 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
                0 discarded for missing IPsec protection
                0 discarded due to memory shortage
                7480 packets hardware-checksummed
                2 bad/missing md5 checksums
                800 good md5 checksums
        29 connection requests
        62 connection accepts
        76 connections established (including accepts)
        112 connections closed (including 3 drops)
        0 connections drained
        7 embryonic connections dropped
        1700 segments updated rtt (of 1664 attempts)
        101 retransmit timeouts
                3 connections dropped by rexmit timeout
        0 persist timeouts
        4 keepalive timeouts
                0 keepalive probes sent
                3 connections dropped by keepalive
        1 correct ACK header prediction
        2301 correct data packet header predictions
        1618 PCB cache misses
        0 ECN connections accepted
                0 ECE packets received
                0 CWR packets received
                0 CE packets received
                0 ECT packets sent
                0 ECE packets sent
                0 CWR packets sent
                        cwr by fastrecovery: 51
                        cwr by timeout: 101
                        cwr by ecn: 0
        1065 bad connection attempts
        245 SYN cache entries added
                0 hash collisions
                62 completed
                0 aborted (no space to build PCB)
                172 timed out
                0 dropped due to overflow
                0 dropped due to bucket overflow
                4 dropped due to RST
                0 dropped due to ICMP unreachable
        725 SYN,ACKs retransmitted
        182 duplicate SYNs received for entries already in the cache
        0 SYNs dropped (no route or no space)
        51 SACK recovery episodes
                229 segment rexmits in SACK recovery episodes
                18668 byte rexmits in SACK recovery episodes
        449 SACK options received
        26 SACK options sent

I hope this help a bit more. In any case, it's been now more then 30 minutes and still neither the 3.7 or current have recover, or ever establish a session yet. From this stage, the only way to establish a session is to clear from the Cisco side and as the session is in active mode, before it gets to the OpenSent stage, I then clean the bgpd side, the session will come up right away, but only if done in that order.

Now I need to get some sleep...

Daniel

Reply via email to