Re: [pfSense Support] Inbound Loadbalancing problem - SOLVED
Bill Marquette wrote: On 4/24/07, Gary Buckmaster [EMAIL PROTECTED] wrote: This issue turned out to be primarily a configuration problem, although it serves as a good lesson for others to learn from so I'll post the reply for the sake of posterity. background We currently have 16 web servers in production handling requests. They are sitting behind Cisco Localdirectors. Because of how the LocalDirectors are configured, its not a simple plug-and-play scenario to substitute in the pfSense boxes. In order to make the transition more smooth, a number of machines were multi-homed so as to exist behind the localdirectors and the new pfSense network. /background The astute reader will quickly surmise what happened. Although the web servers were located on both networks, their default route was inadvertently left alone. Thus traffic coming from the pfSense boxes was replied to using the wrong network card, causing the timeout issues. This turned out to be a blessing in disguise because it demonstrated a more gentle way we could transition to the new machines without interrupting service dramatically as DNS propagated to the new cluster. I'm not following what the gentle way of transitioning to the new machines is. Care to elaborate a little? Did you change the default route on part of the farm and disable the interfaces on the machines that should still be going through the LocalDirector? --Bill PS. I'm very happy to see pfSense replace a LocalDirector - I honestly didn't expect to see anyone using the load balancing code when I wrote it, except for the one person that requested it. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Bill, We ended up putting the pfSense cluster into the LocalDirector pool and slowly transitioned more servers behind the pfSense cluster and away from the LocalDirector pool as traffic stopped hitting the LocalDirectors and started hitting the pfSense cluster directly. Due to the odd DNS caching employed by some ISPs, I suspect that it'll be about a week before we can pull the LocalDirectors completely out of the mix, but we're already seeing a significant amount of traffic going straight to the pfSense boxes. I'm happy to report that, other than the backup pfSense box randomly promoting itself to master, the load balancing is working flawlessly. Mind you, this is being done on hardware that is nowhere near top of the line. The pfSense boxes are 1.2Ghz Celerons with 1G of DDR 266 memory and IDE hard drives. Considering that they'll be ultimately handling 65million plus web requests daily, that's pretty cost effective, especially considering commercial load balancing products. At some point it might be good to talk about adding some additional reporting functionality to the load balancer but as you pointed out, its probably being under-utilized by the general pfSense community. We're planning on putting it extensively to use for our customers and so this was a very good trial by fire for the functionality. The fact that we felt comfortable enough setting this up with a beta speaks volumes for the quality of the pfSense product. -Gary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Inbound Loadbalancing problem - SOLVED
Bill - fyi, I find the load balancing part of pfsense to be invaluable! I'd bet there are a number more out there that feel the same! I'd love to see more development in that area for pfsense - for example, handling squid proxy on the box, etc... Michael Oh On 4/24/07, Bill Marquette [EMAIL PROTECTED] wrote: PS. I'm very happy to see pfSense replace a LocalDirector - I honestly didn't expect to see anyone using the load balancing code when I wrote it, except for the one person that requested it. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Thanks! Michael Oh
[pfSense Support] Inbound Loadbalancing problem
Prior to trying to install this into production, I had this entire scenario working perfectly in a test environment. Something, it seems, has changed between testing and production. I have a cluster of 15 web servers which I intend to load balance with a CARP'd cluster. I've created a CARP VIP address which will be the virtual server address and another one on the LAN to serve as the gateway for the server pool. CARP failover has been configured and appears to work properly, although the secondary load balancer, for some odd reason, is always the Master. The problem comes when I try to test web connectivity to the balanced servers. Traffic hits the virtual server address, hits the load balanced pool of servers and that appears to be where things stop. A tcpdump shows that traffic appears to be coming from both pfSense boxes, which seems contrary to the way the load balancer should be working: 10:10:56.089142 IP 192.168.100.3.62747 192.168.100.161.http: S 2531494251:2531494251(0) win 65228 mss 1460,nop,wscale 0,nop,nop,timestamp 7490736 0,sackOK,eol 10:10:56.089220 IP 192.168.100.161.http 192.168.100.3.62747: S 1542065227:1542065227(0) ack 2531494252 win 65535 mss 1460,nop,wscale 1,nop,nop,timestamp 6878409 7490736,nop,nop,sackOK 10:10:56.089780 IP 192.168.100.3.62747 192.168.100.161.http: . ack 1 win 65535 nop,nop,timestamp 7490737 6878409 10:10:56.090036 IP 192.168.100.3.62747 192.168.100.161.http: F 1:1(0) ack 1 win 65535 nop,nop,timestamp 7490737 6878409 10:10:56.090081 IP 192.168.100.161.http 192.168.100.3.62747: . ack 2 win 33304 nop,nop,timestamp 6878409 7490737 10:10:56.090129 IP 192.168.100.161.http 192.168.100.3.62747: F 1:1(0) ack 2 win 33304 nop,nop,timestamp 6878409 7490737 10:10:56.090800 IP 192.168.100.3.62747 192.168.100.161.http: . ack 2 win 1071 nop,nop,timestamp 7490738 6878409 10:10:57.186346 IP 192.168.100.2.60821 192.168.100.161.http: S 4259965474:4259965474(0) win 65228 mss 1460,nop,wscale 0,nop,nop,timestamp 5838503 0,sackOK,eol 10:10:57.186401 IP 192.168.100.161.http 192.168.100.2.60821: S 1151731680:1151731680(0) ack 4259965475 win 65535 mss 1460,nop,wscale 1,nop,nop,timestamp 6878519 5838503,nop,nop,sackOK 10:10:57.186673 IP 192.168.100.2.60821 192.168.100.161.http: . ack 1 win 65535 nop,nop,timestamp 5838504 6878519 10:10:57.186941 IP 192.168.100.2.60821 192.168.100.161.http: F 1:1(0) ack 1 win 65535 nop,nop,timestamp 5838504 6878519 10:10:57.186984 IP 192.168.100.161.http 192.168.100.2.60821: . ack 2 win 33304 nop,nop,timestamp 6878519 5838504 10:10:57.187037 IP 192.168.100.161.http 192.168.100.2.60821: F 1:1(0) ack 2 win 33304 nop,nop,timestamp 6878519 5838504 10:10:57.187747 IP 192.168.100.2.60821 192.168.100.161.http: . ack 2 win 1071 nop,nop,timestamp 5838505 6878519 I'm at a loss trying to figure out what the issue is. -Gary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Inbound Loadbalancing problem
Both boxes are likely polling the web servers in question, hence the traffic from both machines. You might confirm that you have rules loaded to allow this traffic. --Bill On 4/24/07, Gary Buckmaster [EMAIL PROTECTED] wrote: Prior to trying to install this into production, I had this entire scenario working perfectly in a test environment. Something, it seems, has changed between testing and production. I have a cluster of 15 web servers which I intend to load balance with a CARP'd cluster. I've created a CARP VIP address which will be the virtual server address and another one on the LAN to serve as the gateway for the server pool. CARP failover has been configured and appears to work properly, although the secondary load balancer, for some odd reason, is always the Master. The problem comes when I try to test web connectivity to the balanced servers. Traffic hits the virtual server address, hits the load balanced pool of servers and that appears to be where things stop. A tcpdump shows that traffic appears to be coming from both pfSense boxes, which seems contrary to the way the load balancer should be working: 10:10:56.089142 IP 192.168.100.3.62747 192.168.100.161.http: S 2531494251:2531494251(0) win 65228 mss 1460,nop,wscale 0,nop,nop,timestamp 7490736 0,sackOK,eol 10:10:56.089220 IP 192.168.100.161.http 192.168.100.3.62747: S 1542065227:1542065227(0) ack 2531494252 win 65535 mss 1460,nop,wscale 1,nop,nop,timestamp 6878409 7490736,nop,nop,sackOK 10:10:56.089780 IP 192.168.100.3.62747 192.168.100.161.http: . ack 1 win 65535 nop,nop,timestamp 7490737 6878409 10:10:56.090036 IP 192.168.100.3.62747 192.168.100.161.http: F 1:1(0) ack 1 win 65535 nop,nop,timestamp 7490737 6878409 10:10:56.090081 IP 192.168.100.161.http 192.168.100.3.62747: . ack 2 win 33304 nop,nop,timestamp 6878409 7490737 10:10:56.090129 IP 192.168.100.161.http 192.168.100.3.62747: F 1:1(0) ack 2 win 33304 nop,nop,timestamp 6878409 7490737 10:10:56.090800 IP 192.168.100.3.62747 192.168.100.161.http: . ack 2 win 1071 nop,nop,timestamp 7490738 6878409 10:10:57.186346 IP 192.168.100.2.60821 192.168.100.161.http: S 4259965474:4259965474(0) win 65228 mss 1460,nop,wscale 0,nop,nop,timestamp 5838503 0,sackOK,eol 10:10:57.186401 IP 192.168.100.161.http 192.168.100.2.60821: S 1151731680:1151731680(0) ack 4259965475 win 65535 mss 1460,nop,wscale 1,nop,nop,timestamp 6878519 5838503,nop,nop,sackOK 10:10:57.186673 IP 192.168.100.2.60821 192.168.100.161.http: . ack 1 win 65535 nop,nop,timestamp 5838504 6878519 10:10:57.186941 IP 192.168.100.2.60821 192.168.100.161.http: F 1:1(0) ack 1 win 65535 nop,nop,timestamp 5838504 6878519 10:10:57.186984 IP 192.168.100.161.http 192.168.100.2.60821: . ack 2 win 33304 nop,nop,timestamp 6878519 5838504 10:10:57.187037 IP 192.168.100.161.http 192.168.100.2.60821: F 1:1(0) ack 2 win 33304 nop,nop,timestamp 6878519 5838504 10:10:57.187747 IP 192.168.100.2.60821 192.168.100.161.http: . ack 2 win 1071 nop,nop,timestamp 5838505 6878519 I'm at a loss trying to figure out what the issue is. -Gary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Inbound Loadbalancing problem - SOLVED
This issue turned out to be primarily a configuration problem, although it serves as a good lesson for others to learn from so I'll post the reply for the sake of posterity. background We currently have 16 web servers in production handling requests. They are sitting behind Cisco Localdirectors. Because of how the LocalDirectors are configured, its not a simple plug-and-play scenario to substitute in the pfSense boxes. In order to make the transition more smooth, a number of machines were multi-homed so as to exist behind the localdirectors and the new pfSense network. /background The astute reader will quickly surmise what happened. Although the web servers were located on both networks, their default route was inadvertently left alone. Thus traffic coming from the pfSense boxes was replied to using the wrong network card, causing the timeout issues. This turned out to be a blessing in disguise because it demonstrated a more gentle way we could transition to the new machines without interrupting service dramatically as DNS propagated to the new cluster. Thanks to the pfSense team for such a great product and their help in figuring out the issue. Bill Marquette wrote: Both boxes are likely polling the web servers in question, hence the traffic from both machines. You might confirm that you have rules loaded to allow this traffic. --Bill On 4/24/07, Gary Buckmaster [EMAIL PROTECTED] wrote: Prior to trying to install this into production, I had this entire scenario working perfectly in a test environment. Something, it seems, has changed between testing and production. I have a cluster of 15 web servers which I intend to load balance with a CARP'd cluster. I've created a CARP VIP address which will be the virtual server address and another one on the LAN to serve as the gateway for the server pool. CARP failover has been configured and appears to work properly, although the secondary load balancer, for some odd reason, is always the Master. The problem comes when I try to test web connectivity to the balanced servers. Traffic hits the virtual server address, hits the load balanced pool of servers and that appears to be where things stop. A tcpdump shows that traffic appears to be coming from both pfSense boxes, which seems contrary to the way the load balancer should be working: 10:10:56.089142 IP 192.168.100.3.62747 192.168.100.161.http: S 2531494251:2531494251(0) win 65228 mss 1460,nop,wscale 0,nop,nop,timestamp 7490736 0,sackOK,eol 10:10:56.089220 IP 192.168.100.161.http 192.168.100.3.62747: S 1542065227:1542065227(0) ack 2531494252 win 65535 mss 1460,nop,wscale 1,nop,nop,timestamp 6878409 7490736,nop,nop,sackOK 10:10:56.089780 IP 192.168.100.3.62747 192.168.100.161.http: . ack 1 win 65535 nop,nop,timestamp 7490737 6878409 10:10:56.090036 IP 192.168.100.3.62747 192.168.100.161.http: F 1:1(0) ack 1 win 65535 nop,nop,timestamp 7490737 6878409 10:10:56.090081 IP 192.168.100.161.http 192.168.100.3.62747: . ack 2 win 33304 nop,nop,timestamp 6878409 7490737 10:10:56.090129 IP 192.168.100.161.http 192.168.100.3.62747: F 1:1(0) ack 2 win 33304 nop,nop,timestamp 6878409 7490737 10:10:56.090800 IP 192.168.100.3.62747 192.168.100.161.http: . ack 2 win 1071 nop,nop,timestamp 7490738 6878409 10:10:57.186346 IP 192.168.100.2.60821 192.168.100.161.http: S 4259965474:4259965474(0) win 65228 mss 1460,nop,wscale 0,nop,nop,timestamp 5838503 0,sackOK,eol 10:10:57.186401 IP 192.168.100.161.http 192.168.100.2.60821: S 1151731680:1151731680(0) ack 4259965475 win 65535 mss 1460,nop,wscale 1,nop,nop,timestamp 6878519 5838503,nop,nop,sackOK 10:10:57.186673 IP 192.168.100.2.60821 192.168.100.161.http: . ack 1 win 65535 nop,nop,timestamp 5838504 6878519 10:10:57.186941 IP 192.168.100.2.60821 192.168.100.161.http: F 1:1(0) ack 1 win 65535 nop,nop,timestamp 5838504 6878519 10:10:57.186984 IP 192.168.100.161.http 192.168.100.2.60821: . ack 2 win 33304 nop,nop,timestamp 6878519 5838504 10:10:57.187037 IP 192.168.100.161.http 192.168.100.2.60821: F 1:1(0) ack 2 win 33304 nop,nop,timestamp 6878519 5838504 10:10:57.187747 IP 192.168.100.2.60821 192.168.100.161.http: . ack 2 win 1071 nop,nop,timestamp 5838505 6878519 I'm at a loss trying to figure out what the issue is. -Gary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Inbound Loadbalancing problem - SOLVED
On 4/24/07, Gary Buckmaster [EMAIL PROTECTED] wrote: This issue turned out to be primarily a configuration problem, although it serves as a good lesson for others to learn from so I'll post the reply for the sake of posterity. background We currently have 16 web servers in production handling requests. They are sitting behind Cisco Localdirectors. Because of how the LocalDirectors are configured, its not a simple plug-and-play scenario to substitute in the pfSense boxes. In order to make the transition more smooth, a number of machines were multi-homed so as to exist behind the localdirectors and the new pfSense network. /background The astute reader will quickly surmise what happened. Although the web servers were located on both networks, their default route was inadvertently left alone. Thus traffic coming from the pfSense boxes was replied to using the wrong network card, causing the timeout issues. This turned out to be a blessing in disguise because it demonstrated a more gentle way we could transition to the new machines without interrupting service dramatically as DNS propagated to the new cluster. I'm not following what the gentle way of transitioning to the new machines is. Care to elaborate a little? Did you change the default route on part of the farm and disable the interfaces on the machines that should still be going through the LocalDirector? --Bill PS. I'm very happy to see pfSense replace a LocalDirector - I honestly didn't expect to see anyone using the load balancing code when I wrote it, except for the one person that requested it. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]