Strange OpenBGPD Problem after MAC address change

2007-10-19 Thread Christian Plattner

Hi,

I have a non-urgent problem with OpenBPGD and would like to know
if anybody has a suggestion on what went wrong/I did wrong.

Situation: I replaced an openbgpd based router (R1) with new hardware.
Of course, the mac addresses of the interfaces changed. After
the swap the BGP session with another openbgpd router (R20) did
not come up. Other sessions between R1 and other machines came
up without any problems. I run OpenBSD 4.1-stable at both routers.

bgpctl output on R1, the router with the new hardware: 


Neighbor  ASMsgRcvdMsgSentOutQ  Up/Down
IBGP with R20 XYZ   0  0  0 NeverActive

bgpctl output on R20:
Neighbor  ASMsgRcvdMsgSentOutQ  Up/Down
IBGP with R1  XYZ   100216  10455 0 00:27:52 Active

R1 is a poor guy, it regularly tries to open the connection to
port 179 on R20, but the SYN packets are simply ignored. On the
other hand, tcpdump shows that R20 does not try to open a TCP
connection to R1.

This is content of /var/log/messages of R20 around the time when
the old R1 router was shutdown (its local address X.X.96.20 is
on vlan201).

Oct 19 08:18:41 R20 bgpd[21642]: neighbor X.X.96.1 (IBGP with R1):
received notification: Cease, none
Oct 19 08:19:11 R20 bgpd[21642]: neighbor X.X.96.1 (IBGP with R1):
connect: Operation not permitted
Oct 19 08:21:12 R20 bgpd[21642]: neighbor X.X.96.1 (IBGP with R1):
connect: Operation not permitted
Oct 19 08:22:55 R20 /bsd: arp info overwritten for X.X.96.1 by
00:00:24:c8:d9:f8 on vlan201

I did not restart the bgpd on R20 (which would certainly help),
as I would like to further track down the problem.

- Christian



Re: vic(4) on ESX 3.0.2

2007-10-15 Thread Christian Plattner
Sometimes it is very annoying that your settings in the .vmx won't be 
respected / changed back by the VI client.


A very slow, but bullet-proof method is the following:

1.) Connect directly with the VI client to the ESX
(I do not have virtual center)

2.) Stop the VM and remove it from the inventory.
(right click on the vm in the left pannel,
then "Remove from the Inventory")

3.) Change the .vmx file, i.e., append something like

(if you want to use the em driver and have 3 interfaces)

ethernet0.virtualDev = "e1000"
ethernet1.virtualDev = "e1000"
ethernet2.virtualDev = "e1000"

or (if you want to use use the vic driver)

ethernet0.virtualDev = "vmxnet"
ethernet1.virtualDev = "vmxnet"
ethernet2.virtualDev = "vmxnet"

4.) Add the VM again to the repository.

(With the VI client, go to the global Configuration Tab,
click on "Storage (SCSI, SAN and NFS)", then right-click
on the storage (i.e., typically "storage1") and choose "Browse
Datastore...". Search for the .vmx file and then via
right-click "Add to Inventory".)



Re: OpenBSD on ESX - Networking experiences

2007-10-14 Thread Christian Plattner

Just for the record:

I upgraded to ESX 3.0.2 and...

1.) So far, I did not observe any stalls on the emulated e1000 (em)
interfaces. Currently I am playing with the vmxnet driver as well.

2.) VGT mode seems to work correct, very short ethernet frames (i.e.,
ICMP ping packets produced by windows XP machines routed over a virtual
OpenBSD 4.1 firewall with interfaces in VGT mode) are not dropped
anymore.

Thanks for all the feedback.



Re: vr driver trouble on Soekris 5501

2007-10-12 Thread Christian Plattner

>For what it's worth, I experienced the same problem caused by
>attaching and detaching a (short) crossover cable multiple times
>on a vr interface in

The cable used in the situation when things went wrong
was also short, < 1m.

>soekris net5501 running 4.1-stable. As it was on a production
>firewall I didn't troubleshoot much, tcpdump didn't show any incoming
>traffic on that

OK, same phenomenon.

>interface - then I went for a quick reboot that obviously fixed
>things. Let me see if I can replicate it in lab.

That would be very nice - unfortunately, I don't have any spare
Soekris 5501 boxes.



Re: vr driver trouble on Soekris 5501

2007-10-12 Thread Christian Plattner
Are these interfaces configured in "autoselect" mode? What happens when 
you configure them in fixed mode, e.g. 100Mb/full duplex?


Sounds like a good idea. In the future, I will configure them in fixed
mode, it won't hurt.

Even though it may have something to do with the autoselect mode,
the whole story still has a bad smell =) I mean, the problem persisted
even though I several times disconnected the cable, changed the cable,
hooked the Soekris to a Dell 5324 Switch (to a port in autoselect mode)
and nevertheless the interface did not get out of the stalled state
until I did the ifconfig down/up sequence.



Re: vr driver trouble on Soekris 5501

2007-10-12 Thread Christian Plattner

Not sure if related, but something similar has been fixed in
4.2-current already.


This was also the first thing that came into my mind, however, I don't
think it is related. VR_STICKHW is only written erroneously during
attach, and since my machine runs now for several weeks without any
problem, I doubt that the observed stall has something to do with this.

Opinions by the vr maintainers? Anything I can do to debug the problem 
when it occurs next time?




Re: OpenBSD on ESX - Networking experiences

2007-10-02 Thread Christian Plattner

> I've personally not had any issues with the vlance driver.  Have two
> 4.1 guests on ESX3.0.1, been running since around July without issues
> well,
...
> considering I haven't had any issues to date.  I will admit they don't
> get a lot of heavy use.

In my case the pattern produced by executing "find /" in a ssh session
was enough to crash the VM when using the vlance driver and ESX 3.0.1.

Modifying (lowering) the PCN_NTXSEGS value helped.
(see http://archives.neohapsis.com/archives/openbsd/2006-12/1655.html)



OpenBSD on ESX - Networking experiences

2007-10-02 Thread Christian Plattner

Dear all,

I am seeking for people that run OpenBSD 4.1 on ESX servers and want to 
share their experiences =)


To make sure that I won't provoke replies like 'idiot, virtualization 
subverts the safety of OpenBSD' I hereby declare that I do not want to 
use this for productive systems (...not). Furtermore, I know that 
OpenBSD is not officially supported on ESX.


When it comes to discussing OpenBSD on ESX, people often write stuff 
like "it runs just fine when I use the e1000 network if emulation".


In my experience, things are not so clear. OK, using the e1000 is a must 
(the vlance driver does not properly work with the emulation done by 
ESX, probably an issue with the PCN_NTXSEGS value (16) in if_pcn.c)). 
However, using the e1000 emulation is also not trouble free.


I played around with various VMs and sometimes the em driver suddenly 
did not receive any packets anymore (and the packets it sent went to 
nirvana). Link status etc. all OK. I observed this on an unpatched ESX 
3.0.1. People tend to overreact in such cases (they reboot VMs), but in 
at least one case I was able to intervene and to do a "ifconfig em0 down 
&& ifconfig em0 up" - which helped.


Another issue is VGT mode (a must if you want to handle more than 4 
vlans in a VM). Sometimes, incoming short ethernet frames get lost (I 
know, this is very vague).


Any other experiences/observations/ideas?

- Christian



Re: OpenOSPFd and kernel routing table (new variant)

2007-06-01 Thread Christian Plattner

I applied the diff manually to -stable (watch out for
path_updateall/prefix_updateall), and now it works perfectly.

Thanks, Claudio!


And here is a preliminary diff for all the curious ones. bgpd needs to
track changes of routes with F_NEXTHOP checked and report them to the RDE.
The RDE will then update all active routes that use this nexthop. Seems to
work for me.




OpenOSPFd and kernel routing table (new variant)

2007-05-30 Thread Christian Plattner

Hi,

I am testing OpenBGPD and OpenOSPFD on a couple of Soekris boxes.
Even though I am using the latest code (-stable with ospfd kroute.c
revision 1.48), I am having problems with the kernel routing table
when OSPFD has to react to changes in the topology. I verified the
problem on a virtual setup (a couple of OpenBSD machines on an ESX
server), same result.

The problem can be summarized as follows: When I take down an interface
on one machine manually (e.g., ifconfig em1 down), then the OpenOSPFD
on another machine has no problems to detect this, routes to subnets in
the same AS will be adapted. However, the kernel continues to route
packets to destinations outside of the AS still over the dead link.

Fix: When I restart ospfd, the kernel routing table is OK again.

Here is an example with 3 routers that I have put together using
ESX/VMWare:

/em1-(.1) --- 10.74.96.0/27  --- (.2)--em0\
   +--  (.22)-em0-[R1]   [R2]
   |\em2-(.33) -- 10.74.96.32/27 -- (.34)--em1/
10.0.0.0/24
   |
   +--- (.1)-em1-[R0]-em0 -- (62.2.0.0/16)

Router R0: AS65002 announces 62.2.0.0/16 to R1
Router R1: AS65001 announces 10.74.96.0/21 to R0
Router R2: AS65001 has an IBGP session with R1
Loopback (lo1) addresses: R1=10.74.97.1, R2=10.74.97.2

This setting works fine, I can ping from R2 to machines in 62.2.0.0/16.
Traffic between R1 and R2 flows over the upper link.

However, lets assume that one of the links between R1 and R2 fails.

[R1] # ifconfig em1 down (so eventually R2 will find out that I does
not receive any OSPF packets on em0 anymore).

It takes a while, but then ospfd on R2 has calculated the new topology:

[R2] # ospfctl show rib
Destination  Nexthop   Path TypeType  Cost
0.0.0.1  10.74.96.33   Intra-Area   Router11
10.74.96.0/2710.74.96.33   Intra-Area   Network   21
10.74.96.32/27   10.74.96.34   Intra-Area   Network   11
10.74.97.1/3210.74.96.33   Intra-Area   Network   21
10.0.0.0/24  10.74.96.33   Type 1 ext   Network   111
(uptime column deleted, to comply with the 72 char restriction
of the mailing list).

[R2] # ospfctl show fib
flags: * = valid, O = OSPF, C = Connected, S = Static
Flags  Destination  Nexthop
*O 10.0.0.0/24  10.74.96.33
*  10.74.96.0/2110.74.96.1
*C 10.74.96.0/27link#1
*C 10.74.96.32/27   link#2
*O 10.74.97.1/3210.74.96.33
*  10.74.97.2/3210.74.97.2
*  62.2.0.0/16  10.74.96.1
*S 127.0.0.0/8  127.0.0.1
*C 127.0.0.1/8  link#0
*  127.0.0.1/32 127.0.0.1
*S 224.0.0.0/4  127.0.0.1

This is not good, as the (via IBGP learned) route to 62.2.0.0/16 still
points to 10.74.96.1 (which is not directly reachable anymore).

Now let's kill and restart ospfd on R2, then check again:

# ospfctl show fib
flags: * = valid, O = OSPF, C = Connected, S = Static
Flags  Destination  Nexthop
*O 10.0.0.0/24  10.74.96.33
*  10.74.96.0/2110.74.96.33
*C 10.74.96.0/27link#1
*C 10.74.96.32/27   link#2
*O 10.74.97.1/3210.74.96.33
*  10.74.97.2/3210.74.97.2
*  62.2.0.0/16  10.74.96.33
*S 127.0.0.0/8  127.0.0.1
*C 127.0.0.1/8  link#0
*  127.0.0.1/32 127.0.0.1
*S 224.0.0.0/4  127.0.0.1

Voil`, now it looks OK =)

This is the ospfd.conf of R2:

password="gurke"
router-id 0.0.0.2
redistribute connected
redistribute static

area 0.0.0.0 {

interface lo1

interface em0 {
metric 10
auth-type simple
auth-key $password
}
interface em1 {
metric 11
auth-type simple
auth-key $password
}
}

Any suggstions? Am I making a substantial error?

I did not want to make this posting too long, so if somebody is
interested in the detailed config files then I can make them
available.

Thanks,
- Christian