Hi, We have recently migrated our game servers from Linux to FreeBSD. We have 8 web servers running in jails, with HAProxy as load balancer. We also have CARP configured in case of network failover.
carp is running as master on the 1st server(webm01), and backup on the 2nd server(webm02). haproxy on both servers are actively running, though only one is working at a time, depending on which server with carp acting as master. Both servers have pf running as well. We are running FreeBSD 8.2-RELEASE, haproxy-1.4.15, apache-2.2.19 and the game is php coded. Our network architecture is as follows. There is a backend database running as well on a jail in a different server, which I excluded from the diagram (hope the ascii diagram will be displayed well in the mail): +----- wj01 | (webm01) |------ wj02 user -------- carp -------- haproxy ------+ | |------ wj03 | | | +----- wj04 | | +----- wj05 | | | |----- wj06 carp -------- haproxy ------+ (webm02) |----- wj07 | +----- wj08 Our main problem at the moment is a lot of users (more than a hundred users) have complained that they are getting a "504 Gateway Timeout" error. This normally happens at night (CEST), when most players start playing the game. However, the load of our servers are consistently low at all time. At the moment there is no obvious pattern as to when this error occurs. Here is our haproxy.conf: global log /var/run/log local0 notice maxconn 4096 daemon chroot /var/run/haproxy user haproxy group haproxy stats socket /var/run/haproxy/haproxy.sock uid 1005 gid 1005 defaults log global mode http option httpclose option forwardfor option httplog option tcplog option dontlognull option tcpka retries 3 option redispatch maxconn 2000 timeout connect 5000 timeout client 50000 timeout server 50000 listen webjailfarm 78.xx.xx.xx:80 mode http cookie SERVERID insert nocache indirect balance roundrobin option httpclose option forwardfor option httpchk HEAD / HTTP/1.0 stats uri /haproxy-status stats enable stats auth admin:password server wj01 192.168.30.10:80 <http://192.168.30.10/> cookie A weight 10 check inter 2000 rise 2 fall 2 server wj02 192.168.30.20:80 <http://192.168.30.20/> cookie B weight 10 check inter 2000 rise 2 fall 2 server wj03 192.168.30.30:80 <http://192.168.30.30/> cookie C weight 10 check inter 2000 rise 2 fall 2 server wj04 192.168.30.40:80 <http://192.168.30.40/> cookie D weight 10 check inter 2000 rise 2 fall 2 server wj05 192.168.30.50:80 <http://192.168.30.50/> cookie E weight 10 check inter 2000 rise 2 fall 2 server wj06 192.168.30.60:80 <http://192.168.30.60/> cookie F weight 10 check inter 2000 rise 2 fall 2 server wj07 192.168.30.70:80 <http://192.168.30.70/> cookie G weight 10 check inter 2000 rise 2 fall 2 server wj08 192.168.30.80:80 <http://192.168.30.80/> cookie H weight 10 check inter 2000 rise 2 fall 2 ################################################################## And here is our pf.conf (the exact same pf is running on webm02, only the IPs changed accordingly): ### macros webm01 = 78.xx.xx.xx db = 10.10.10.101 carp_dev = "carp0" ext_if = "igb0" jail_if = "igb0:0" trusted = "{ 192.168.30.0/24, 10.10.10.0/24, 78.xx.xx.xx/xx, 85.xx.xx.xx/xx }" tcp_services = "{ xxxxx, 4949 }" ssh_ports = "{ xxxxx, xxxxx, xxxxx, xxxxx }" icmp_types = "{ echoreq, unreach }" # jails wj01 = 192.168.30.10 wj02 = 192.168.30.20 wj03 = 192.168.30.30 wj04 = 192.168.30.40 jails = "{" $wj01 $wj02 $wj03 $wj04 "}" ### normalization scrub in all ### translation nat on $ext_if from $jails to !10.10.10.0/24 -> ($jail_if) rdr pass on $ext_if inet proto tcp from any to $webm01 port xxxxx -> $wj01 ### ssh redirect rdr pass on $ext_if inet proto tcp from any to $webm01 port xxxxx -> $wj02 rdr pass on $ext_if inet proto tcp from any to $webm01 port xxxxx -> $wj03 rdr pass on $ext_if inet proto tcp from any to $webm01 port xxxxx -> $wj04 rdr pass on $ext_if inet proto tcp from any to ($carp_dev) port 80 -> $webm01 ### filtering - drop incoming everything block in all block return ### keep state of outgoing connections pass out keep state ### skip loopback interface set skip on { lo0 } ### spoofing protection for all interfaces block in quick from urpf-failed antispoof log for $ext_if ### allow outgoing pass out on $ext_if proto tcp to any port $tcp_services pass out quick on $ext_if proto udp from $webm01 to any port = 123 keep state pass quick on $ext_if proto carp keep state (no-sync) pass out on $carp_dev proto tcp to any port 80 ### allow incoming services from within internal network to ssh ports pass in on $ext_if proto tcp from $trusted to $wj01 port xxxxx flags S/SA synproxy state pass in on $ext_if proto tcp from $trusted to $wj02 port xxxxx flags S/SA synproxy state pass in on $ext_if proto tcp from $trusted to $wj03 port xxxxx flags S/SA synproxy state pass in on $ext_if proto tcp from $trusted to $wj04 port xxxxx flags S/SA synproxy state ### allow incoming services pass in on $ext_if proto tcp from any to $jails port 80 flags S/SA synproxy state pass in on $ext_if proto tcp from any to $webm01 port $tcp_services flags S/SA synproxy state pass inet proto icmp all icmp-type $icmp_types keep state ### for munin pass in on $ext_if proto tcp from $trusted to $jails port 4949 flags S/SA synproxy state If there are more information needed, please let me know. Appreciate any advice offered. Thanks.