CARP + OpenBGPd, fail-over

2006-06-14 Thread Thomas Bader
Hi all

I got a setup with two OpenBSD boxes which both do a
BGP-peering to our upstream internet provider and both
provide redundancy to our internal LANs with CARP and
pfsync.

The setup looks like the following:

 | $ext_if   | $ext_if
 | (with BGP)| (with BGP)
 ..
 |  r0a  ||  r0b  |
 |___| $pfsync_if |___|
   |   ||   |
   |a  |b   |a  |b
 
 (a and b are connected to our two LAN segments and are
  CARPed. They're later on called $dmz_if and $lan_if.)

So far this setup works fine. r0a is master and usually all
traffic is routed over this machine. In case r0a goes down,
r0b takes over. The BGP-Peering to our upstream on r0a has a
"depend on carp0" and "depend on carp1" in its
configuration. Further, both machines have a BGP-peering
together over $pfsync_if.

I verified my setup by:

 a.) unplugging either $dmz_if or $lan_if or both on r0a
 makes r0b taking over - everything works fine. Traffic
 will then get routed from LAN to r0b and to the
 internet.
 b.) switching off r0a makes r0b taking over - everything
 works fine. Traffic will then get routed from LAN to
 r0b and to the internet.

In one case the fail-over does not work well: If the
BGP-peering on r0a to the upstream goes down all traffic
will be routed from r0a via $pfsync_if to r0b and to the
upstream from there on. SSH and browsing through web pages
with HTTP works that way. But downloads with HTTP or FTP do
not work.

As long as traffic gets routed from LAN via r0a to r0b every
large download just stalls after a few kbytes. With tcpdump
I found out that the first few kbytes make it through and
afterwards ICMP host-unreachable messages will be generated.

If I do a "pfctl -F rules" on both machines, then everything
works fine. At the first place I thought about a blocking
rule in PF and put at "log" option to very "block" rule I
have on both machines. I made sure that pflogd is running
and attached tcpdump to pflog0 and retried my fail-over
tests. Apparently, no packets get blocked on both machines.

For now I wonder why all downloads stall in the specific
fail-over situation mentioned above. Rather small
transmissions (browsing web pages, using SSH) work fine and
I actually can't see a reason why downloads should stall.

I attached my pf.conf below. I replaced all IP address with
(empty). I added the altq stuff recently. While I did the
fail-over tests mentioned above the altq stuff wasn't there.
So I guess that altq has no impact on my problem. In the
pf.conf there are actually only a few rules because the
mentioned setup is not yet in productive business and there
are only a few machines using it yet.

Further, I attached a dmesg of r0a. Both my OpenBSD boxes
are the same machine type, so one dmesg should be enough.

I would appreciate to get some hints how I could further
debug this problem. I worked some hours yet on that problem
and so far I wasn't able to get any step further in solving
it.

With best regards,
Thomas.

./.

###
## macros

# interfaces
ext_if="fxp0"
pfsync_if="fxp1"
lan_if="em0"
lan_carp_if="carp0"
dmz_if="fxp2"
dmz_carp_if="carp1"

# ranges
UNIVERSE="0.0.0.0/0"
EXT="(empty)"  # transit range
EXT_CLR="(empty)"  # upstream router
PFSYNC="(empty)"   # pfsync range
LAN="(empty)"
DMZ="(empty)"
VPN="(empty)"
HOUSING="(empty)"

# machines
R1="(empty)"# router 1
R1_HOU="(empty)"
TEMP_HOU="(empty)"

# housings
MATTENWEG="(empty)"
CRAWFISH="(empty)"
AWAY="(empty)"

# services
sVPN="{ 1194 }"
sVNC="{ 5900, 5800  }"
sRDP="{ 3389 }"
sRETRO="{ 497 }"
sFR="{ 8088 }"

###
## options

set skip on lo0 # skip loopback
set skip on $pfsync_if  # skip pfsync
set block-policy return # always return on block

###
## normalization
scrub in all 
scrub out random-id

###
## Traffic Shaping
altq on $ext_if bandwidth 100Mb cbq queue { lan_dparent, dmz_parent, 
prem_services }
 queue lan_dparent on $ext_if bandwidth 5Mb cbq { lan_out, lan_aout, voip_out }
  queue lan_out  on $ext_if bandwidth 2.5Mb cbq
  queue lan_aout on $ext_if bandwidth 500Kb priority 6 cbq
  queue voip_out on $ext_if bandwidth 2Mb priority 7 cbq
 queue dmz_parent on $ext_if bandwidth 10Mb cbq {dmz_out, dmz_ack, hou_out}
  queue dmz_out  on $ext_if bandwidth 7.5Mb cbq(default, borrow)
  queue dmz_ack  on $ext_if bandwidth 500Kb priority 6 cbq
  queue hou_out  on $ext_if bandwidth 2Mb cbq
 queue prem_services on $ext_if bandwidth 20Mb cbq { pp_out, tt_out }
  queue pp_out   on $ext_if bandwidth 10Mb cbq(borrow)
  queue tt_out   on $ext_if bandwidth 10Mb cbq

altq on $lan_if bandwidth 100Mb cbq queue { lan_parent }
 queue lan_parent on $lan_if bandwidth 5Mb cbq { lan_std, lan_ack, voip_out }
  queue lan_std  on $lan_if bandwidth 2.5Mb cbq(default, borrow)
  queue lan_ack  on $lan_if bandwidth 500Kb priority 6 cbq
  queue voip_out on $lan_if bandwidth 2Mb priority 7 cbq

altq on $dmz_if bandw

Re: CARP + OpenBGPd, fail-over

2006-06-14 Thread Stuart Henderson
On 2006/06/14 08:53, Thomas Bader wrote:
> In one case the fail-over does not work well: If the
> BGP-peering on r0a to the upstream goes down all traffic
> will be routed from r0a via $pfsync_if to r0b and to the
> upstream from there on. SSH and browsing through web pages
> with HTTP works that way. But downloads with HTTP or FTP do
> not work.
>
> As long as traffic gets routed from LAN via r0a to r0b every
> large download just stalls after a few kbytes. With tcpdump
> I found out that the first few kbytes make it through and
> afterwards ICMP host-unreachable messages will be generated.

This feels like a path-mtu problem, is em0 using jumbo frames?
If that's the problem, scrub max-mss should help.

> So I guess that altq has no impact on my problem. In the
> pf.conf there are actually only a few rules because the
> mentioned setup is not yet in productive business and there
> are only a few machines using it yet.

Just as long as beer. is safe, that's ok (-:



Re: CARP + OpenBGPd, fail-over

2006-06-14 Thread Thomas Bader
* Stuart Henderson <[EMAIL PROTECTED]> [060614 11:34]:
> On 2006/06/14 08:53, Thomas Bader wrote:
> > As long as traffic gets routed from LAN via r0a to r0b every
> > large download just stalls after a few kbytes. With tcpdump
> > I found out that the first few kbytes make it through and
> > afterwards ICMP host-unreachable messages will be generated.
> 
> This feels like a path-mtu problem, is em0 using jumbo frames?
> If that's the problem, scrub max-mss should help.

According to ifconfig all my interfaces have a MTU of 1500
set - so, jumbo frames are not getting used at all. My em0
doesn't even have a 1Gbit connection; everything is at
100Mbit.

As far as I understand flushing the PF rules shouldn't make
any difference if the problem would be something like
path-mtu (after flushing the PF rules on both boxes,
everything works well). I also disabled the two "scrub"
rules before loading my pf.conf. This did also not help.

Do you have any suggestions about trying "scrub" with
different options (for example, max-mss)?

The key is somewhere in the difference between having rules
loaded and clearing them with "pfctl -F rules" (which does
not disable PF, it only flushes the filter rules). I haven't
yet found the difference which leads to my problem. Any help
is greatly appreciated.

I even once thought about the states - maybe, the connection
gets stalled because PF looses the state table entry. I
haven't yet found a way to further debug into this way.
pfsync apparently seems to work well (for example, if the
fail-over occurs my open ssh sessions aren't killed, they
are still usuable). Any suggestions here?

Regards,
Thomas.



Re: CARP + OpenBGPd, fail-over

2006-06-14 Thread Henning Brauer
* Thomas Bader <[EMAIL PROTECTED]> [2006-06-14 09:02]:
> In one case the fail-over does not work well: If the
> BGP-peering on r0a to the upstream goes down all traffic
> will be routed from r0a via $pfsync_if to r0b

yhis case requires bgpd to actively take influence on teh carp state.

now, lucky you, I have a diff for current doing exactly that :)
you need -current from after the hackathon, as this needs the carp 
group demotion stuff. you then basically add "demote carp" to the 
session you cae about. when that session goes down, bgpd increases the 
demotion counter for said group (really only makes sense for carp 
groups).
yes, manpages missing so far...
hardcore testing welcome.

also, as for everybody successfully using openbgpd, we welcome 
testimonials for http://www.openbgpd.org/users.html :)

Index: Makefile
===
RCS file: /cvs/src/usr.sbin/bgpd/Makefile,v
retrieving revision 1.24
diff -u -p -r1.24 Makefile
--- Makefile3 Jan 2006 22:19:59 -   1.24
+++ Makefile14 Jun 2006 16:19:03 -
@@ -6,7 +6,7 @@ PROG=   bgpd
 SRCS=  bgpd.c buffer.c session.c log.c parse.y config.c imsg.c \
rde.c rde_rib.c rde_decide.c rde_prefix.c mrt.c kroute.c \
control.c pfkey.c rde_update.c rde_attr.c printconf.c \
-   rde_filter.c pftable.c name2id.c util.c
+   rde_filter.c pftable.c name2id.c util.c carp.c
 CFLAGS+= -Wall -I${.CURDIR}
 CFLAGS+= -Wstrict-prototypes -Wmissing-prototypes
 CFLAGS+= -Wmissing-declarations
Index: bgpd.c
===
RCS file: /cvs/src/usr.sbin/bgpd/bgpd.c,v
retrieving revision 1.137
diff -u -p -r1.137 bgpd.c
--- bgpd.c  27 May 2006 21:24:36 -  1.137
+++ bgpd.c  14 Jun 2006 16:19:04 -
@@ -132,8 +132,11 @@ main(int argc, char *argv[])
peer_l = NULL;
conf.csock = SOCKET_NAME;
 
-   while ((ch = getopt(argc, argv, "dD:f:nr:s:v")) != -1) {
+   while ((ch = getopt(argc, argv, "cdD:f:nr:s:v")) != -1) {
switch (ch) {
+   case 'c':
+   conf.opts |= BGPD_OPT_FORCE_DEMOTE;
+   break;
case 'd':
debug = 1;
break;
@@ -645,6 +648,19 @@ dispatch_imsg(struct imsgbuf *ibuf, int 
log_warnx("IFINFO request with wrong len");
else
kr_ifinfo(imsg.data);
+   break;
+   case IMSG_DEMOTE:
+   if (idx != PFD_PIPE_SESSION)
+   log_warnx("demote request not from SE");
+   else if (imsg.hdr.len != IMSG_HEADER_SIZE +
+   sizeof(struct demote_msg))
+   log_warnx("DEMOTE request with wrong len");
+   else {
+   struct demote_msg   *msg;
+
+   msg = (struct demote_msg *)imsg.data;
+   carp_demote_set(msg->demote_group, msg->level);
+   }
break;
default:
break;
Index: bgpd.h
===
RCS file: /cvs/src/usr.sbin/bgpd/bgpd.h,v
retrieving revision 1.201
diff -u -p -r1.201 bgpd.h
--- bgpd.h  27 May 2006 21:24:36 -  1.201
+++ bgpd.h  14 Jun 2006 16:19:04 -
@@ -49,6 +49,7 @@
 #defineBGPD_OPT_VERBOSE0x0001
 #defineBGPD_OPT_VERBOSE2   0x0002
 #defineBGPD_OPT_NOACTION   0x0004
+#defineBGPD_OPT_FORCE_DEMOTE   0x0008
 
 #defineBGPD_FLAG_NO_FIB_UPDATE 0x0001
 #defineBGPD_FLAG_NO_EVALUATE   0x0002
@@ -220,6 +221,7 @@ struct peer_config {
char group[PEER_DESCR_LEN];
char descr[PEER_DESCR_LEN];
char if_depend[IFNAMSIZ];
+   char demote_group[IFNAMSIZ];
u_int32_tid;
u_int32_tgroupid;
u_int32_tmax_prefix;
@@ -327,7 +329,8 @@ enum imsg_type {
IMSG_CTL_SHOW_RIB_MEM,
IMSG_CTL_SHOW_TERSE,
IMSG_REFRESH,
-   IMSG_IFINFO
+   IMSG_IFINFO,
+   IMSG_DEMOTE
 };
 
 struct imsg_hdr {
@@ -340,6 +343,11 @@ struct imsg_hdr {
 struct imsg {
struct imsg_hdr  hdr;
void*data;
+};
+
+struct demote_msg {
+   char demote_group[IFNAMSIZ];
+   int  level;
 };
 
 enum ctl_results {
Index: carp.c
===
RCS file: carp.c
diff -N carp.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ carp.c  14 Jun 2006 16:19:04 -
@@ -0,0 +1,160 @@
+/* $OpenBSD$ */
+
+/*
+ * Copyright (c) 2006 Henning Braue

Re: CARP + OpenBGPd, fail-over

2006-06-14 Thread Henning Brauer
* Thomas Bader <[EMAIL PROTECTED]> [2006-06-14 09:02]:
> In one case the fail-over does not work well: If the
> BGP-peering on r0a to the upstream goes down all traffic
> will be routed from r0a via $pfsync_if to r0b and to the
> upstream from there on. SSH and browsing through web pages
> with HTTP works that way. But downloads with HTTP or FTP do
> not work.
> 
> As long as traffic gets routed from LAN via r0a to r0b every
> large download just stalls after a few kbytes. With tcpdump
> I found out that the first few kbytes make it through and
> afterwards ICMP host-unreachable messages will be generated.

this, btw, is likely because of tcp window scaling, and one of the 
machines not seeing all packets for that tcp connection, thus not 
sclaing the window, thus dropping packets because of seuqence numbers 
seemingly out of the window. pfsync cannot keep up fast enough - it's 
not made for that (it is "best effort" anyway), and I doubt it can be 
made to deal with a situation like thsi properly without significant 
drawbacks.

-- 
BS Web Services, http://www.bsws.de/
OpenBSD-based Webhosting, Mail Services, Managed Servers, ...
Unix is very simple, but it takes a genius to understand the simplicity.
(Dennis Ritchie)



Re: CARP + OpenBGPd, fail-over

2006-06-27 Thread Thomas Bader
Henning Brauer schrieb:
> * Thomas Bader <[EMAIL PROTECTED]> [2006-06-14 09:02]:
>> In one case the fail-over does not work well: If the
>> BGP-peering on r0a to the upstream goes down all traffic
>> will be routed from r0a via $pfsync_if to r0b
> 
> yhis case requires bgpd to actively take influence on teh carp state.
> 
> now, lucky you, I have a diff for current doing exactly that :)
> you need -current from after the hackathon, as this needs the carp 
> group demotion stuff.

Oh, that sounds fine, thank you. I will surely test that out in my
testing environment.

Can you estimate when this patch will be integrated into -stable?

> also, as for everybody successfully using openbgpd, we welcome 
> testimonials for http://www.openbgpd.org/users.html :)

OK, I'll look what I can do about that :)

> this, btw, is likely because of tcp window scaling, and one of the 
> machines not seeing all packets for that tcp connection, thus not 
> sclaing the window, thus dropping packets because of seuqence numbers
>  seemingly out of the window. pfsync cannot keep up fast enough -
> it's not made for that (it is "best effort" anyway), and I doubt it
> can be made to deal with a situation like thsi properly without
> significant drawbacks.

So, apparently, the main difference I was looking for between having PF
enabled and disabled is state tracking.

Regards, Thomas.



Re: CARP + OpenBGPd, fail-over

2006-06-27 Thread Joachim Schipper
On Tue, Jun 27, 2006 at 10:44:20AM +0200, Thomas Bader wrote:
> Henning Brauer schrieb:
> > * Thomas Bader <[EMAIL PROTECTED]> [2006-06-14 09:02]:
> >> In one case the fail-over does not work well: If the
> >> BGP-peering on r0a to the upstream goes down all traffic
> >> will be routed from r0a via $pfsync_if to r0b
> > 
> > yhis case requires bgpd to actively take influence on teh carp state.
> > 
> > now, lucky you, I have a diff for current doing exactly that :)
> > you need -current from after the hackathon, as this needs the carp 
> > group demotion stuff.
> 
> Oh, that sounds fine, thank you. I will surely test that out in my
> testing environment.
> 
> Can you estimate when this patch will be integrated into -stable?

Almost certainly never; -stable doesn't get new features. Run -current,
or wait for 4.0.

Joachim