Re: pf/carp for redundant production use

2005-09-26 Thread Jason Dixon

On Sep 26, 2005, at 11:07 AM, Chad M Stewart wrote:


On Sep 25, 2005, at 9:39 PM, Jason Dixon wrote:


On Sep 25, 2005, at 8:30 AM, Neil wrote:

Yep, the same behavior when the master dies. The solution that  
the person in #pf told me is use routing but I don't know how to  
implement. He told me that it's an issue in pf's NAT.


2) This is not tested, but I suspect that you should be able to  
use the new interface grouping features in 3.8 to simply assign  
multiple physical interfaces to the same group.  Even if one  
fails, the other *should* maintain the MASTER state and avoid any  
partial failure consequences.  I'd love to hear from other users  
or developers that have tried the grouping feature in this sort of  
scenario.


Can you share where one might read more about the interface  
grouping features of 3.8?


Sorry, I meant to refer to the new trunking features (man 4 trunk).

--
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net




Re: pf/carp for redundant production use

2005-09-26 Thread Chad M Stewart

On Sep 25, 2005, at 9:39 PM, Jason Dixon wrote:


On Sep 25, 2005, at 8:30 AM, Neil wrote:


Yep, the same behavior when the master dies. The solution that the  
person in #pf told me is use routing but I don't know how to  
implement. He told me that it's an issue in pf's NAT.




Bullshit.

Ok, here is the layman's description of the problem and the  
practical solution(s) to it.  I'd love to be able to explain why  
interfaces recovering from INIT don't reclaim MASTER faster than  
they do (approx 30 seconds in my tests), but I don't understand the  
code-level logistics of everything.  Hint:  This is only a problem  
using single CARP hosts with preemption.


PROBLEM:

With a simple CARP design using a single CARP host on each segment  
and preemption enabled, failover occurs as expected in the case of  
any system offline condition (server crashes, admin reboots, etc).   
If a single interface goes from MASTER to INIT state (cable gets  
pulled, cable goes bad, card goes bad, etc), the 2nd interface on  
that system will go into BACKUP mode as expected.  Traffic will  
route across the new MASTER, and will continue to do so while the  
failed system is in an INIT/BACKUP state.


However, if the failed interface returns from INIT to an available  
mode (we plug the cable in), we notice that the 2nd interface  
reclaims MASTER almost immediately, but the restored interface does  
not.  It becomes a BACKUP host, which leaves us with a routing  
impossibility:


I agree a routing impossibility.  Last week I built a lab to test/ 
build a new HA firewall.  In my testing I did not see the 30 second  
delay people are reporting.  Both carp interfaces on the primary  
would take over as MASTER within seconds of bringing the 'failed'  
physical interface back online.


I started a large file download over http with everything running  
through the primary firewall.  I then pulled a cable and watched the  
download of the file, it slowed slightly, but went right back to  
previous speed. (Like your scp demo at NYCBSDCON.)  I actually  
disconnected and reconnected the cable a bunch of times and the  
download never stopped.


I did notice one strange thing.  I have 3 physical interfaces and two  
carp interfaces on each firewall.  I noticed that if I was pinging  
the external/carp0 address and failed things over, say by doing  
'ifconfig rl0 down' the ping would continue with zero packet loss.   
If I do that same thing on the internal/carp1, I see a small amount  
of packet loss.  I don't really care about that since most clients/ 
people are not going to notice.  I've already tested and know that  
downloads and other such things continue to work without a problem.   
I found it strange that carp0 would not have a packet loss while  
carp1 would.  I did not investigate the packet loss further to know  
if maybe it was the hub/switch combo I'm using on the inside vs  
external.






BACKUP   MASTER
   carp0 carp0
  | |
   host1 host2
  | |
   carp1 carp1
MASTER   BACKUP

Any internal clients will attempt to send traffic through the "new  
gateway" (host1), although neither system has any way of routing  
the traffic properly (not without some hokey static routes  
bypassing the CARP hosts).  NOTE:  I have found that the original  
MASTER does indeed return to the correct state, approximately 30  
seconds later.  This is reproducible, but YMMV.


SOLUTION:

1) If you really are concerned about a partial system failure  
(unplugged cable, bad card, etc), then scrap the single CARP host/ 
segment design and use arpbalance with multiple CARP hosts.  The  
same partial-failure test using 2 CARP hosts on each segment with  
arpbalance resulted in a perfect failover and recovery with no  
packet loss.


2) This is not tested, but I suspect that you should be able to use  
the new interface grouping features in 3.8 to simply assign  
multiple physical interfaces to the same group.  Even if one fails,  
the other *should* maintain the MASTER state and avoid any partial  
failure consequences.  I'd love to hear from other users or  
developers that have tried the grouping feature in this sort of  
scenario.


Can you share where one might read more about the interface grouping  
features of 3.8?


I'm using a snapshot from September 10th in my lab.

-Chad



Re: pf/carp for redundant production use

2005-09-26 Thread Neil
Hi Jason, 

I would like to try your #1 suggestion but unfortunately, I don't know where 
to start. What are the programs I need? What configuration? Is there any 
existing sample configuration on a link that I can follow? 

Thanks for explaining this in very detail. 

Neil 

Jason Dixon writes: 

On Sep 25, 2005, at 8:30 AM, Neil wrote: 

Yep, the same behavior when the master dies. The solution that the  
person in #pf told me is use routing but I don't know how to  implement. 
He told me that it's an issue in pf's NAT.


Bullshit. 

Ok, here is the layman's description of the problem and the practical  
solution(s) to it.  I'd love to be able to explain why interfaces  
recovering from INIT don't reclaim MASTER faster than they do (approx  30 
seconds in my tests), but I don't understand the code-level  logistics of 
everything.  Hint:  This is only a problem using single  CARP hosts with 
preemption. 

PROBLEM: 

With a simple CARP design using a single CARP host on each segment  and 
preemption enabled, failover occurs as expected in the case of  any system 
offline condition (server crashes, admin reboots, etc).   If a single 
interface goes from MASTER to INIT state (cable gets  pulled, cable goes 
bad, card goes bad, etc), the 2nd interface on  that system will go into 
BACKUP mode as expected.  Traffic will route  across the new MASTER, and 
will continue to do so while the failed  system is in an INIT/BACKUP 
state. 

However, if the failed interface returns from INIT to an available  mode 
(we plug the cable in), we notice that the 2nd interface  reclaims MASTER 
almost immediately, but the restored interface does  not.  It becomes a 
BACKUP host, which leaves us with a routing  impossibility: 


BACKUP   MASTER
   carp0 carp0
  | |
   host1 host2
  | |
   carp1 carp1
MASTER   BACKUP 

Any internal clients will attempt to send traffic through the "new  
gateway" (host1), although neither system has any way of routing the  
traffic properly (not without some hokey static routes bypassing the  CARP 
hosts).  NOTE:  I have found that the original MASTER does  indeed return 
to the correct state, approximately 30 seconds later.   This is 
reproducible, but YMMV. 

SOLUTION: 

1) If you really are concerned about a partial system failure  (unplugged 
cable, bad card, etc), then scrap the single CARP host/ segment design and 
use arpbalance with multiple CARP hosts.  The same  partial-failure test 
using 2 CARP hosts on each segment with  arpbalance resulted in a perfect 
failover and recovery with no packet  loss. 

2) This is not tested, but I suspect that you should be able to use  the 
new interface grouping features in 3.8 to simply assign multiple  physical 
interfaces to the same group.  Even if one fails, the other  *should* 
maintain the MASTER state and avoid any partial failure  consequences.  
I'd love to hear from other users or developers that  have tried the 
grouping feature in this sort of scenario. 



--
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net 

 





Re: pf/carp for redundant production use

2005-09-26 Thread Jason Dixon

On Sep 25, 2005, at 8:30 AM, Neil wrote:

Yep, the same behavior when the master dies. The solution that the  
person in #pf told me is use routing but I don't know how to  
implement. He told me that it's an issue in pf's NAT.


Bullshit.

Ok, here is the layman's description of the problem and the practical  
solution(s) to it.  I'd love to be able to explain why interfaces  
recovering from INIT don't reclaim MASTER faster than they do (approx  
30 seconds in my tests), but I don't understand the code-level  
logistics of everything.  Hint:  This is only a problem using single  
CARP hosts with preemption.


PROBLEM:

With a simple CARP design using a single CARP host on each segment  
and preemption enabled, failover occurs as expected in the case of  
any system offline condition (server crashes, admin reboots, etc).   
If a single interface goes from MASTER to INIT state (cable gets  
pulled, cable goes bad, card goes bad, etc), the 2nd interface on  
that system will go into BACKUP mode as expected.  Traffic will route  
across the new MASTER, and will continue to do so while the failed  
system is in an INIT/BACKUP state.


However, if the failed interface returns from INIT to an available  
mode (we plug the cable in), we notice that the 2nd interface  
reclaims MASTER almost immediately, but the restored interface does  
not.  It becomes a BACKUP host, which leaves us with a routing  
impossibility:


BACKUP   MASTER
   carp0 carp0
  | |
   host1 host2
  | |
   carp1 carp1
MASTER   BACKUP

Any internal clients will attempt to send traffic through the "new  
gateway" (host1), although neither system has any way of routing the  
traffic properly (not without some hokey static routes bypassing the  
CARP hosts).  NOTE:  I have found that the original MASTER does  
indeed return to the correct state, approximately 30 seconds later.   
This is reproducible, but YMMV.


SOLUTION:

1) If you really are concerned about a partial system failure  
(unplugged cable, bad card, etc), then scrap the single CARP host/ 
segment design and use arpbalance with multiple CARP hosts.  The same  
partial-failure test using 2 CARP hosts on each segment with  
arpbalance resulted in a perfect failover and recovery with no packet  
loss.


2) This is not tested, but I suspect that you should be able to use  
the new interface grouping features in 3.8 to simply assign multiple  
physical interfaces to the same group.  Even if one fails, the other  
*should* maintain the MASTER state and avoid any partial failure  
consequences.  I'd love to hear from other users or developers that  
have tried the grouping feature in this sort of scenario.



--
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net




Re: pf/carp for redundant production use

2005-09-26 Thread Jason Dixon

On Sep 26, 2005, at 1:31 AM, Neil wrote:


Hi Jason,
I would like to try your #1 suggestion but unfortunately, I don't  
know where to start. What are the programs I need? What  
configuration? Is there any existing sample configuration on a link  
that I can follow?

Thanks for explaining this in very detail.


Please stop top-posting.

Always start at the man pages; there is an example given (man 4  
carp).  There is a similar configuration in my NYC BSD Con slides  
(http://www.dixongroup.net/NYCBSDCON/); see the "Advanced Example".


--
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net




Re: pf/carp for redundant production use

2005-09-26 Thread j knight

Neil wrote:

Hi everyone,
Just chat with someone in #pf and found out that pf at the moment cannot 
maintain state on TCP connections from internal machine to external 
machine when network cable on master firewall's external interface is 
removed.
Anyways, most connections are coming from outside to inside and that is 
working well. :)


This person is talking about state being kept on the backup firewall 
(which gets promoted to master when the master's cable is unplugged)? If 
so, that doesn't make any sense whatsoever.




.joel


Re: pf/carp for redundant production use

2005-09-25 Thread Michiel van Baak
On 07:30, Sun 25 Sep 05, Neil wrote:
> Yep, the same behavior when the master dies. The solution that the person 
> in #pf told me is use routing but I don't know how to implement. He told me 
> that it's an issue in pf's NAT. 

Does this mean you cannot failover an office NAT firewall ?
Pretty useless then if you ask me
-- 
Michiel van Baak
http://michiel.vanbaak.info
[EMAIL PROTECTED]
GnuPG key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x7E0B9A2D

"Why is it drug addicts and computer afficionados are both called users?"


Re: pf/carp for redundant production use

2005-09-25 Thread Michiel van Baak
On 00:21, Sun 25 Sep 05, Neil wrote:
> Hi everyone, 
> 
> Just chat with someone in #pf and found out that pf at the moment cannot 
> maintain state on TCP connections from internal machine to external machine 
> when network cable on master firewall's external interface is removed. 
> 
> Anyways, most connections are coming from outside to inside and that is 
> working well. :) 
> 

Is the same true when the master dies ??

-- 
Michiel van Baak
http://michiel.vanbaak.info
[EMAIL PROTECTED]
GnuPG key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x7E0B9A2D

"Why is it drug addicts and computer afficionados are both called users?"


Re: pf/carp for redundant production use

2005-09-25 Thread Neil
Hi everyone, 

Just chat with someone in #pf and found out that pf at the moment cannot 
maintain state on TCP connections from internal machine to external machine 
when network cable on master firewall's external interface is removed. 

Anyways, most connections are coming from outside to inside and that is 
working well. :) 



Neil writes: 

Hi Joel,  

I just created a new email post. :)  

Thanks,  

neil  

j knight writes:  


Neil wrote:
Yup that did the fix for the inbound. Now, I tried connecting to an ssh 
server from the internal machine to the external machine running openssh 
and i disconnected the cable, however, the ssh session was not able to 
recover. What should I change in my pf.conf configuration.

Thanks for the first one. It's awesome! :D
j knight writes:
  

Hard to say. What does your troubleshooting tell you? What does pflog 
tell you? What does the state table look like on the new master?  

  


.joel






Re: pf/carp for redundant production use

2005-09-23 Thread Neil
Hi Joel, 

I just created a new email post. :) 

Thanks, 

neil 

j knight writes: 


Neil wrote:
Yup that did the fix for the inbound. Now, I tried connecting to an ssh 
server from the internal machine to the external machine running openssh 
and i disconnected the cable, however, the ssh session was not able to 
recover. What should I change in my pf.conf configuration.

Thanks for the first one. It's awesome! :D
j knight writes:
 

Hard to say. What does your troubleshooting tell you? What does pflog tell 
you? What does the state table look like on the new master? 

 


.joel




Re: pf/carp for redundant production use

2005-09-23 Thread j knight

Neil wrote:
Yup that did the fix for the inbound. Now, I tried connecting to an ssh 
server from the internal machine to the external machine running openssh 
and i disconnected the cable, however, the ssh session was not able to 
recover. What should I change in my pf.conf configuration.

Thanks for the first one. It's awesome! :D
j knight writes:



Hard to say. What does your troubleshooting tell you? What does pflog 
tell you? What does the state table look like on the new master?




.joel


Re: pf/carp for redundant production use

2005-09-22 Thread Neil
Yup that did the fix for the inbound. Now, I tried connecting to an ssh 
server from the internal machine to the external machine running openssh and 
i disconnected the cable, however, the ssh session was not able to recover. 
What should I change in my pf.conf configuration. 

Thanks for the first one. It's awesome! :D 

j knight writes: 


Neil wrote:
Ok guys. I will do it tonight once I reach home. I will also send my 
pf.conf file.

Also, does it matter since I have different interfaces on FW1 and FW2?
FW1, xl0, fxp0 and fxp1
FW2: rl0, fxp0 and ne3


You're using 'set state-policy if-bound' so yes, that does matter. Remove 
that set option. 

 


.joel




Re: pf/carp for redundant production use

2005-09-22 Thread j knight

Neil wrote:
Ok guys. I will do it tonight once I reach home. I will also send my 
pf.conf file.

Also, does it matter since I have different interfaces on FW1 and FW2?
FW1, xl0, fxp0 and fxp1
FW2: rl0, fxp0 and ne3


You're using 'set state-policy if-bound' so yes, that does matter. 
Remove that set option.




.joel


Re: pf/carp for redundant production use

2005-09-21 Thread Neil
Hi everyone, 


Firewall 1 troubleshooting info can be found at
http://restricted.dyndns.org/pffw1.txt 

Firewall 2 @ http://restricted.dyndns.org/pffw2.txt 


The links include:
1. ifconfig output pre/post cable removal
2. pfctl -s state pre/post cable removal
3. pf.conf configs of both firewall 

Please let me know what you find. 

Thanks in advance, 

Neil 



Matt Rowley writes: 




I got pf and carp working together. However, I have noticed that TCP
oriented application doesn't get recover well when I disconnect a
cable.  I setup a netcat listener on a machine inside the network.
Then I ran  netcat from another machine outside the network. I was
able to connect  and was able to send some characters. However, when I
disconnected the  primary firewall's external interface, netcat won't
work anymore until I  execute netcat again that connects to the shared
external ip address. Am I missing any configuration? Looks like it's
related to pf state  tables not being sent to the backup firewall.


Show your entire pf.conf.
Let's see some troubleshooting commands. Run ifconfig before and after
pulling the cable, etc.


pfctl -s state on the carp slave would also be helpful, to see if pfsync 
is getting through. 







Re: pf/carp for redundant production use

2005-09-21 Thread Neil
Ok guys. I will do it tonight once I reach home. I will also send my pf.conf 
file. 

Also, does it matter since I have different interfaces on FW1 and FW2? 


FW1, xl0, fxp0 and fxp1
FW2: rl0, fxp0 and ne3 

Thanks guys! ;) 

Neil 

Matt Rowley writes: 




I got pf and carp working together. However, I have noticed that TCP
oriented application doesn't get recover well when I disconnect a
cable.  I setup a netcat listener on a machine inside the network.
Then I ran  netcat from another machine outside the network. I was
able to connect  and was able to send some characters. However, when I
disconnected the  primary firewall's external interface, netcat won't
work anymore until I  execute netcat again that connects to the shared
external ip address. Am I missing any configuration? Looks like it's
related to pf state  tables not being sent to the backup firewall.


Show your entire pf.conf.
Let's see some troubleshooting commands. Run ifconfig before and after
pulling the cable, etc.


pfctl -s state on the carp slave would also be helpful, to see if pfsync 
is getting through. 







Re: pf/carp for redundant production use

2005-09-21 Thread Matt Rowley

I got pf and carp working together. However, I have noticed that TCP
oriented application doesn't get recover well when I disconnect a
cable.  I setup a netcat listener on a machine inside the network.
Then I ran  netcat from another machine outside the network. I was
able to connect  and was able to send some characters. However, when I
disconnected the  primary firewall's external interface, netcat won't
work anymore until I  execute netcat again that connects to the shared
external ip address. Am I missing any configuration? Looks like it's
related to pf state  tables not being sent to the backup firewall.


Show your entire pf.conf.
Let's see some troubleshooting commands. Run ifconfig before and after
pulling the cable, etc.


pfctl -s state on the carp slave would also be helpful, to see if pfsync 
is getting through.





Re: pf/carp for redundant production use

2005-09-21 Thread j knight

Neil wrote:

Hi guys,
I got pf and carp working together. However, I have noticed that TCP 
oriented application doesn't get recover well when I disconnect a cable. 
I setup a netcat listener on a machine inside the network. Then I ran 
netcat from another machine outside the network. I was able to connect 
and was able to send some characters. However, when I disconnected the 
primary firewall's external interface, netcat won't work anymore until I 
execute netcat again that connects to the shared external ip address.
Am I missing any configuration? Looks like it's related to pf state 
tables not being sent to the backup firewall.



Show your entire pf.conf.
Let's see some troubleshooting commands. Run ifconfig before and after 
pulling the cable, etc.




.joel


Re: pf/carp for redundant production use

2005-09-21 Thread Neil
Hi guys, 

I got pf and carp working together. However, I have noticed that TCP 
oriented application doesn't get recover well when I disconnect a cable. I 
setup a netcat listener on a machine inside the network. Then I ran netcat 
from another machine outside the network. I was able to connect and was able 
to send some characters. However, when I disconnected the primary firewall's 
external interface, netcat won't work anymore until I execute netcat again 
that connects to the shared external ip address. 

Am I missing any configuration? Looks like it's related to pf state tables 
not being sent to the backup firewall. 

Please help. 

Thanks, 

Neil 

Neil writes: 

Hi guys,  

I'm very new to carp. I used openbsd and pf about 2 yrs so I have 
forgotten it too. Anyways, I just finished building 2 machines with 3 nics 
on each machine. I got CARP working as well but have some questions.  

Here is my configuration:  


/***
/* FW1:
/***
external interface: fxp1 => 192.168.1.1/24
internal interface: xl0  => 172.16.0.1/16
pfsync interfacefxp0 => 10.10.10.1/24  

carp0: inet 172.16.0.100 255.255.0.0 172.16.255.255 carpdev xl0 vhid 1 
pass lanpasswd
carp1: inet 192.168.1.100 255.255.255.0 192.168.1.255 carpdev fxp1 vhid 2 
pass netpasswd
pfsync0: up syncif fxp0  



/***
/* FW2:
/***
external interface: ne3  => 192.168.1.2/24
internal interface: rl0  => 172.16.0.2/16
pfsync interfacefxp0 => 10.10.10.2/24  

carp0: inet 172.16.0.100 255.255.0.0 172.16.255.255 carpdev rl0 vhid 1 
pass lanpasswd advskew 128
carp1: inet 192.168.1.100 255.255.255.0 192.168.1.255 carpdev ne3 vhid 2 
pass netpasswd advskew 128
pfsync0: up syncif fxp0  


LAN shared IP:  172.16.0.100
WAN/Internet shared IP: 192.168.1.100  


DIAGRAM:
   EXTERNAL
   +| 192.168.1.x  |+
   ||
   fxp1||ne3
+-+  +-+
| fw1 |-fxp0---10.10.10.x---fxp0-| fw2 |
+-+  +-+
xl0||rl0
   ||
---+| 172.16.x.x   |+---
   INTERNAL  



1. Let say we want to do some NAT using CARP/PF setup:  


web server public: 192.168.1.10
web server NAT:172.16.1.10(real ip)  


mailserver public: 192.168.1.11
mailserver NAT:172.16.1.11(real ip) 


a. How will I configure CARP?
b. How will I configure the pf.conf on both firewalls? An example will 
really help me a lot.
c. Do I also have to create an alias interface for the 2 machine's 
external interface?  



2. Can someone please send me a pf.conf that can be used in production 
environment?  

3. Am I correct that my internal mailserver's and webserver's gateway 
should point to 172.16.0.100?  

4. What if the interface where our pfsync is configured goes bad or cable 
gets disconnected, what happens?  

5. Other than this setup, are there anything that I can add to make it 
much more reliable?  

Thanks in advance!  


Neil