Strange OpenBGPD Problem after MAC address change
Hi, I have a non-urgent problem with OpenBPGD and would like to know if anybody has a suggestion on what went wrong/I did wrong. Situation: I replaced an openbgpd based router (R1) with new hardware. Of course, the mac addresses of the interfaces changed. After the swap the BGP session with another openbgpd router (R20) did not come up. Other sessions between R1 and other machines came up without any problems. I run OpenBSD 4.1-stable at both routers. bgpctl output on R1, the router with the new hardware: Neighbor ASMsgRcvdMsgSentOutQ Up/Down IBGP with R20 XYZ 0 0 0 NeverActive bgpctl output on R20: Neighbor ASMsgRcvdMsgSentOutQ Up/Down IBGP with R1 XYZ 100216 10455 0 00:27:52 Active R1 is a poor guy, it regularly tries to open the connection to port 179 on R20, but the SYN packets are simply ignored. On the other hand, tcpdump shows that R20 does not try to open a TCP connection to R1. This is content of /var/log/messages of R20 around the time when the old R1 router was shutdown (its local address X.X.96.20 is on vlan201). Oct 19 08:18:41 R20 bgpd[21642]: neighbor X.X.96.1 (IBGP with R1): received notification: Cease, none Oct 19 08:19:11 R20 bgpd[21642]: neighbor X.X.96.1 (IBGP with R1): connect: Operation not permitted Oct 19 08:21:12 R20 bgpd[21642]: neighbor X.X.96.1 (IBGP with R1): connect: Operation not permitted Oct 19 08:22:55 R20 /bsd: arp info overwritten for X.X.96.1 by 00:00:24:c8:d9:f8 on vlan201 I did not restart the bgpd on R20 (which would certainly help), as I would like to further track down the problem. - Christian
Re: vic(4) on ESX 3.0.2
Sometimes it is very annoying that your settings in the .vmx won't be respected / changed back by the VI client. A very slow, but bullet-proof method is the following: 1.) Connect directly with the VI client to the ESX (I do not have virtual center) 2.) Stop the VM and remove it from the inventory. (right click on the vm in the left pannel, then "Remove from the Inventory") 3.) Change the .vmx file, i.e., append something like (if you want to use the em driver and have 3 interfaces) ethernet0.virtualDev = "e1000" ethernet1.virtualDev = "e1000" ethernet2.virtualDev = "e1000" or (if you want to use use the vic driver) ethernet0.virtualDev = "vmxnet" ethernet1.virtualDev = "vmxnet" ethernet2.virtualDev = "vmxnet" 4.) Add the VM again to the repository. (With the VI client, go to the global Configuration Tab, click on "Storage (SCSI, SAN and NFS)", then right-click on the storage (i.e., typically "storage1") and choose "Browse Datastore...". Search for the .vmx file and then via right-click "Add to Inventory".)
Re: OpenBSD on ESX - Networking experiences
Just for the record: I upgraded to ESX 3.0.2 and... 1.) So far, I did not observe any stalls on the emulated e1000 (em) interfaces. Currently I am playing with the vmxnet driver as well. 2.) VGT mode seems to work correct, very short ethernet frames (i.e., ICMP ping packets produced by windows XP machines routed over a virtual OpenBSD 4.1 firewall with interfaces in VGT mode) are not dropped anymore. Thanks for all the feedback.
Re: vr driver trouble on Soekris 5501
>For what it's worth, I experienced the same problem caused by >attaching and detaching a (short) crossover cable multiple times >on a vr interface in The cable used in the situation when things went wrong was also short, < 1m. >soekris net5501 running 4.1-stable. As it was on a production >firewall I didn't troubleshoot much, tcpdump didn't show any incoming >traffic on that OK, same phenomenon. >interface - then I went for a quick reboot that obviously fixed >things. Let me see if I can replicate it in lab. That would be very nice - unfortunately, I don't have any spare Soekris 5501 boxes.
Re: vr driver trouble on Soekris 5501
Are these interfaces configured in "autoselect" mode? What happens when you configure them in fixed mode, e.g. 100Mb/full duplex? Sounds like a good idea. In the future, I will configure them in fixed mode, it won't hurt. Even though it may have something to do with the autoselect mode, the whole story still has a bad smell =) I mean, the problem persisted even though I several times disconnected the cable, changed the cable, hooked the Soekris to a Dell 5324 Switch (to a port in autoselect mode) and nevertheless the interface did not get out of the stalled state until I did the ifconfig down/up sequence.
Re: vr driver trouble on Soekris 5501
Not sure if related, but something similar has been fixed in 4.2-current already. This was also the first thing that came into my mind, however, I don't think it is related. VR_STICKHW is only written erroneously during attach, and since my machine runs now for several weeks without any problem, I doubt that the observed stall has something to do with this. Opinions by the vr maintainers? Anything I can do to debug the problem when it occurs next time?
Re: OpenBSD on ESX - Networking experiences
> I've personally not had any issues with the vlance driver. Have two > 4.1 guests on ESX3.0.1, been running since around July without issues > well, ... > considering I haven't had any issues to date. I will admit they don't > get a lot of heavy use. In my case the pattern produced by executing "find /" in a ssh session was enough to crash the VM when using the vlance driver and ESX 3.0.1. Modifying (lowering) the PCN_NTXSEGS value helped. (see http://archives.neohapsis.com/archives/openbsd/2006-12/1655.html)
OpenBSD on ESX - Networking experiences
Dear all, I am seeking for people that run OpenBSD 4.1 on ESX servers and want to share their experiences =) To make sure that I won't provoke replies like 'idiot, virtualization subverts the safety of OpenBSD' I hereby declare that I do not want to use this for productive systems (...not). Furtermore, I know that OpenBSD is not officially supported on ESX. When it comes to discussing OpenBSD on ESX, people often write stuff like "it runs just fine when I use the e1000 network if emulation". In my experience, things are not so clear. OK, using the e1000 is a must (the vlance driver does not properly work with the emulation done by ESX, probably an issue with the PCN_NTXSEGS value (16) in if_pcn.c)). However, using the e1000 emulation is also not trouble free. I played around with various VMs and sometimes the em driver suddenly did not receive any packets anymore (and the packets it sent went to nirvana). Link status etc. all OK. I observed this on an unpatched ESX 3.0.1. People tend to overreact in such cases (they reboot VMs), but in at least one case I was able to intervene and to do a "ifconfig em0 down && ifconfig em0 up" - which helped. Another issue is VGT mode (a must if you want to handle more than 4 vlans in a VM). Sometimes, incoming short ethernet frames get lost (I know, this is very vague). Any other experiences/observations/ideas? - Christian
Re: OpenOSPFd and kernel routing table (new variant)
I applied the diff manually to -stable (watch out for path_updateall/prefix_updateall), and now it works perfectly. Thanks, Claudio! And here is a preliminary diff for all the curious ones. bgpd needs to track changes of routes with F_NEXTHOP checked and report them to the RDE. The RDE will then update all active routes that use this nexthop. Seems to work for me.
OpenOSPFd and kernel routing table (new variant)
Hi, I am testing OpenBGPD and OpenOSPFD on a couple of Soekris boxes. Even though I am using the latest code (-stable with ospfd kroute.c revision 1.48), I am having problems with the kernel routing table when OSPFD has to react to changes in the topology. I verified the problem on a virtual setup (a couple of OpenBSD machines on an ESX server), same result. The problem can be summarized as follows: When I take down an interface on one machine manually (e.g., ifconfig em1 down), then the OpenOSPFD on another machine has no problems to detect this, routes to subnets in the same AS will be adapted. However, the kernel continues to route packets to destinations outside of the AS still over the dead link. Fix: When I restart ospfd, the kernel routing table is OK again. Here is an example with 3 routers that I have put together using ESX/VMWare: /em1-(.1) --- 10.74.96.0/27 --- (.2)--em0\ +-- (.22)-em0-[R1] [R2] |\em2-(.33) -- 10.74.96.32/27 -- (.34)--em1/ 10.0.0.0/24 | +--- (.1)-em1-[R0]-em0 -- (62.2.0.0/16) Router R0: AS65002 announces 62.2.0.0/16 to R1 Router R1: AS65001 announces 10.74.96.0/21 to R0 Router R2: AS65001 has an IBGP session with R1 Loopback (lo1) addresses: R1=10.74.97.1, R2=10.74.97.2 This setting works fine, I can ping from R2 to machines in 62.2.0.0/16. Traffic between R1 and R2 flows over the upper link. However, lets assume that one of the links between R1 and R2 fails. [R1] # ifconfig em1 down (so eventually R2 will find out that I does not receive any OSPF packets on em0 anymore). It takes a while, but then ospfd on R2 has calculated the new topology: [R2] # ospfctl show rib Destination Nexthop Path TypeType Cost 0.0.0.1 10.74.96.33 Intra-Area Router11 10.74.96.0/2710.74.96.33 Intra-Area Network 21 10.74.96.32/27 10.74.96.34 Intra-Area Network 11 10.74.97.1/3210.74.96.33 Intra-Area Network 21 10.0.0.0/24 10.74.96.33 Type 1 ext Network 111 (uptime column deleted, to comply with the 72 char restriction of the mailing list). [R2] # ospfctl show fib flags: * = valid, O = OSPF, C = Connected, S = Static Flags Destination Nexthop *O 10.0.0.0/24 10.74.96.33 * 10.74.96.0/2110.74.96.1 *C 10.74.96.0/27link#1 *C 10.74.96.32/27 link#2 *O 10.74.97.1/3210.74.96.33 * 10.74.97.2/3210.74.97.2 * 62.2.0.0/16 10.74.96.1 *S 127.0.0.0/8 127.0.0.1 *C 127.0.0.1/8 link#0 * 127.0.0.1/32 127.0.0.1 *S 224.0.0.0/4 127.0.0.1 This is not good, as the (via IBGP learned) route to 62.2.0.0/16 still points to 10.74.96.1 (which is not directly reachable anymore). Now let's kill and restart ospfd on R2, then check again: # ospfctl show fib flags: * = valid, O = OSPF, C = Connected, S = Static Flags Destination Nexthop *O 10.0.0.0/24 10.74.96.33 * 10.74.96.0/2110.74.96.33 *C 10.74.96.0/27link#1 *C 10.74.96.32/27 link#2 *O 10.74.97.1/3210.74.96.33 * 10.74.97.2/3210.74.97.2 * 62.2.0.0/16 10.74.96.33 *S 127.0.0.0/8 127.0.0.1 *C 127.0.0.1/8 link#0 * 127.0.0.1/32 127.0.0.1 *S 224.0.0.0/4 127.0.0.1 Voil`, now it looks OK =) This is the ospfd.conf of R2: password="gurke" router-id 0.0.0.2 redistribute connected redistribute static area 0.0.0.0 { interface lo1 interface em0 { metric 10 auth-type simple auth-key $password } interface em1 { metric 11 auth-type simple auth-key $password } } Any suggstions? Am I making a substantial error? I did not want to make this posting too long, so if somebody is interested in the detailed config files then I can make them available. Thanks, - Christian