I don't think I'm going to be much help there. I'm connected to the Internet through a wireless broadband right now. I connect to my network through a VPN connection. The server just sees me as a local address.
I had it working for a short time, and I saved what showed up on syslog for the time it stared to work, to the moment it stopped. The last entry is at 16:17:01 when it was still working. It stopped working at about 16:30, but nothing showed in any of the logs. Except again for the entry in the apache error. I'm attaching the syslog copy here showing just the time I talked about. I don't know if that helps, but please know that I really appreciate all the help. Thanks, Tom -----Original Message----- From: Willy Tarreau [mailto:w...@1wt.eu] Sent: Friday, May 15, 2009 3:35 PM To: Tom Potwin Cc: haproxy@formilux.org Subject: Re: New HAProxy user keeps loosing connectionted On Fri, May 15, 2009 at 03:04:24PM -0400, Tom Potwin wrote: > Willy, > > I don't want to post the entire syslog here - waste of space. Is there > something I could look for? yes, grep for your browser's IP address. This will give you the logs of your connections in which we will find what is happening. > When I restart Heartbeat, I see many heartbeat, IPfail, IPaddr, and > ResourceManager entries right after starting. Nothing shows up in the > apache log on the web server until I loose my connection to HAProxy. > Then I see "File does not exist: /var/www/apache2-default/haproxy" in > the apache error log. I hope I made it clear that the apache server is > on a different virtual node then HAProxy is. yes that's what I understood, but it clearly matches the stats+keepalive case. Willy
May 15 16:10:39 lb1 heartbeat: [21994]: info: Core process 22006 exited. 1 remaining May 15 16:10:39 lb1 heartbeat: [21994]: info: lb1.tlthost.net Heartbeat shutdown complete. May 15 16:10:41 lb1 heartbeat: [22781]: WARN: WARNING: directive 'udp' replaced by 'bcast' May 15 16:10:41 lb1 heartbeat: [22781]: WARN: Core dumps could be lost if multiple dumps occur. May 15 16:10:41 lb1 heartbeat: [22781]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability May 15 16:10:41 lb1 heartbeat: [22781]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability May 15 16:10:41 lb1 heartbeat: [22781]: info: Version 2 support: false May 15 16:10:41 lb1 heartbeat: [22781]: WARN: Logging daemon is disabled --enabling logging daemon is recommended May 15 16:10:41 lb1 heartbeat: [22781]: info: ************************** May 15 16:10:41 lb1 heartbeat: [22781]: info: Configuration validated. Starting heartbeat 2.1.3 May 15 16:10:41 lb1 heartbeat: [22782]: info: heartbeat: version 2.1.3 May 15 16:10:41 lb1 heartbeat: [22782]: info: Heartbeat generation: 1240088878 May 15 16:10:41 lb1 heartbeat: [22782]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 May 15 16:10:41 lb1 heartbeat: [22782]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 May 15 16:10:41 lb1 heartbeat: [22782]: info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0) May 15 16:10:41 lb1 heartbeat: [22782]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0 May 15 16:10:41 lb1 heartbeat: [22782]: info: glib: ucast: bound send socket to device: eth0 May 15 16:10:41 lb1 heartbeat: [22782]: info: glib: ucast: bound receive socket to device: eth0 May 15 16:10:41 lb1 heartbeat: [22782]: info: glib: ucast: started on port 694 interface eth0 to 192.168.31.211 May 15 16:10:41 lb1 heartbeat: [22782]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 May 15 16:10:41 lb1 heartbeat: [22782]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 May 15 16:10:41 lb1 heartbeat: [22782]: info: G_main_add_TriggerHandler: Added signal manual handler May 15 16:10:41 lb1 heartbeat: [22782]: info: G_main_add_TriggerHandler: Added signal manual handler May 15 16:10:41 lb1 heartbeat: [22782]: info: G_main_add_SignalHandler: Added signal handler for signal 17 May 15 16:10:41 lb1 heartbeat: [22782]: info: Local status now set to: 'up' May 15 16:10:42 lb1 heartbeat: [22782]: info: Link lb1.tlthost.net:eth0 up. May 15 16:10:43 lb1 heartbeat: [22782]: info: Link lb2.tlthost.net:eth0 up. May 15 16:10:43 lb1 heartbeat: [22782]: info: Status update for node lb2.tlthost.net: status active ray 15 16:10:43 lb1 heartbeat: [22796]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL May 15 16:10:43 lb1 harc[22796]: info: Running /etc/ha.d/rc.d/status status May 15 16:10:43 lb1 heartbeat: [22782]: info: Comm_now_up(): updating status to active May 15 16:10:43 lb1 heartbeat: [22782]: info: Local status now set to: 'active' May 15 16:10:43 lb1 heartbeat: [22782]: info: Starting child client "/usr/lib/heartbeat/ipfail" (108,112) May 15 16:10:43 lb1 heartbeat: [22811]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 108 gid 112 (pid 22811) May 15 16:10:43 lb1 ipfail: [22811]: debug: PID=22811 May 15 16:10:43 lb1 ipfail: [22811]: debug: Signing in with heartbeat May 15 16:10:44 lb1 heartbeat: [22782]: info: remote resource transition completed. May 15 16:10:44 lb1 heartbeat: [22782]: info: remote resource transition completed. May 15 16:10:44 lb1 heartbeat: [22782]: info: Local Resource acquisition completed. (none) May 15 16:10:44 lb1 ipfail: [22811]: debug: [We are lb1.tlthost.net] May 15 16:10:44 lb1 heartbeat: [22782]: info: lb2.tlthost.net wants to go standby [foreign] May 15 16:10:44 lb1 ipfail: [22811]: debug: auto_failback -> 1 (on) May 15 16:10:45 lb1 heartbeat: [22782]: info: standby: acquire [foreign] resources from lb2.tlthost.net May 15 16:10:45 lb1 heartbeat: [22815]: info: acquire local HA resources (standby). May 15 16:10:45 lb1 ipfail: [22811]: debug: Setting message filter mode May 15 16:10:45 lb1 ResourceManager[22829]: info: Acquiring resource group: lb1.tlthost.net 192.168.31.100 May 15 16:10:45 lb1 IPaddr[22855]: INFO: Resource is stopped May 15 16:10:45 lb1 ResourceManager[22829]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.31.100 start May 15 16:10:45 lb1 ResourceManager[22829]: debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.31.100 start May 15 16:10:45 lb1 IPaddr[22926]: INFO: Using calculated nic for 192.168.31.100: eth0 May 15 16:10:45 lb1 IPaddr[22926]: INFO: Using calculated netmask for 192.168.31.100: 255.255.255.0 May 15 16:10:45 lb1 IPaddr[22926]: DEBUG: Using calculated broadcast for 192.168.31.100: 192.168.31.255 May 15 16:10:45 lb1 IPaddr[22926]: INFO: eval ifconfig eth0:0 192.168.31.100 netmask 255.255.255.0 broadcast 192.168.31.255 May 15 16:10:45 lb1 IPaddr[22926]: DEBUG: Sending Gratuitous Arp for 192.168.31.100 on eth0:0 [eth0] May 15 16:10:45 lb1 IPaddr[22911]: INFO: Success May 15 16:10:45 lb1 ResourceManager[22829]: debug: /etc/ha.d/resource.d/IPaddr 192.168.31.100 start done. RC=0 May 15 16:10:45 lb1 heartbeat: [22815]: info: local HA resource acquisition completed (standby). May 15 16:10:45 lb1 heartbeat: [22782]: info: Standby resource acquisition done [foreign]. May 15 16:10:45 lb1 heartbeat: [22782]: info: Initial resource acquisition complete (auto_failback) May 15 16:10:45 lb1 ipfail: [22811]: debug: Starting node walk May 15 16:10:45 lb1 heartbeat: [22782]: info: remote resource transition completed. May 15 16:10:45 lb1 ipfail: [22811]: debug: Cluster node: lb2.tlthost.net: status: active May 15 16:10:46 lb1 ipfail: [22811]: debug: [They are lb2.tlthost.net] May 15 16:10:46 lb1 ipfail: [22811]: debug: Cluster node: lb1.tlthost.net: status: active May 15 16:10:47 lb1 ipfail: [22811]: debug: Setting message signal May 15 16:10:47 lb1 ipfail: [22811]: debug: Waiting for messages... May 15 16:10:48 lb1 ipfail: [22811]: debug: Other side is now stable. May 15 16:10:48 lb1 ipfail: [22811]: debug: Other side is now stable. May 15 16:10:50 lb1 ipfail: [22811]: debug: Other side is unstable. May 15 16:10:50 lb1 ipfail: [22811]: debug: Got asked for num_ping. May 15 16:10:52 lb1 ipfail: [22811]: info: Ping node count is balanced. May 15 16:10:52 lb1 ipfail: [22811]: debug: Abort message sent. May 15 16:10:52 lb1 ipfail: [22811]: info: Giving up foreign resources (auto_failback). May 15 16:10:52 lb1 ipfail: [22811]: info: Delayed giveup in 4 seconds. May 15 16:10:53 lb1 ipfail: [22811]: debug: Other side is now stable. May 15 16:10:53 lb1 ipfail: [22811]: debug: Other side is now stable. May 15 16:10:56 lb1 ipfail: [22811]: info: giveup() called (timeout worked) May 15 16:10:57 lb1 ipfail: [22811]: debug: Message [ask_resources] sent. May 15 16:10:57 lb1 ipfail: [22811]: debug: giveup timeout has been destroyed. May 15 16:10:57 lb1 heartbeat: [22782]: info: lb1.tlthost.net wants to go standby [foreign] May 15 16:10:58 lb1 heartbeat: [22782]: info: standby: lb2.tlthost.net can take our foreign resources May 15 16:10:58 lb1 heartbeat: [23026]: info: give up foreign HA resources (standby). May 15 16:10:58 lb1 heartbeat: [23026]: info: foreign HA resource release completed (standby). May 15 16:10:58 lb1 heartbeat: [22782]: info: Local standby process completed [foreign]. May 15 16:10:58 lb1 heartbeat: [22782]: WARN: 1 lost packet(s) for [lb2.tlthost.net] [46834:46836] May 15 16:10:58 lb1 heartbeat: [22782]: info: remote resource transition completed. May 15 16:10:58 lb1 heartbeat: [22782]: info: No pkts missing from lb2.tlthost.net! May 15 16:10:58 lb1 heartbeat: [22782]: info: Other node completed standby takeover of foreign resources. May 15 16:10:58 lb1 ipfail: [22811]: debug: Other side is now stable. May 15 16:10:59 lb1 ipfail: [22811]: debug: Other side is now stable. May 15 16:17:01 lb1 /USR/SBIN/CRON[23043]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)