[Linux-ha-dev] Problem Heartbeat Changing Time Server
Hello! Anyone know if the heartbeat can change the time for a server? Because when it gets close to 30 days uptime simply migrate services to server1 server2. Reviewing the logs you can check that the time is wrong. This timetable was given before the problem occurred and was correct. I think because of the time shift caused by the heartbeat generated this error in the log. Oct 27 10:22:09 inga heartbeat: [2313]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status was delayed 2998210 ms ( 10010 ms) before being called (GSource: 0x1f11370)Oct 27 10:22:09 inga heartbeat: [2313]: info: Gmain_timeout_dispatch: started at 1965715369 should have started at 1965415548Oct 27 10:22:09 inga heartbeat: [2313]: WARN: Late heartbeat: Node inga: interval 3018270 msOct 27 10:22:09 inga heartbeat: [2313]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 60 ms ( 50 ms) (GSource: 0x1f11370)Oct 27 10:22:09 inga heartbeat: [2313]: WARN: node pitanga: is deadOct 27 10:22:09 inga heartbeat: [2313]: WARN: No STONITH device configured.Oct 27 10:22:09 inga heartbeat: [2313]: WARN: Shared disks are not protected.Oct 27 10:22:09 inga heartbeat: [2313]: info: Resources being acquired from pitanga.Oct 27 10:22:09 inga heartbeat: [2313]: info: Link pitanga:eth2 dead.Oct 27 10:22:09 inga heartbeat: [2313]: WARN: Gmain_timeout_dispatch: Dispatch function for check for signals was delayed 2995370 ms ( 10010 ms) before being called (GSource: 0x1f115b0)Oct 27 10:22:09 inga heartbeat: [2313]: info: Gmain_timeout_dispatch: started at 1965715375 should have started at 1965415838Oct 27 10:22:09 inga heartbeat: [2313]: WARN: Gmain_timeout_dispatch: Dispatch function for update msgfree count was delayed 2999800 ms ( 5 ms) before being called (GSource: 0x1f116a0)Oct 27 10:22:09 inga heartbeat: [2313]: info: Gmain_timeout_dispatch: started at 1965715375 should have started at 1965415395Oct 27 10:22:09 inga heartbeat: [2313]: WARN: Gmain_timeout_dispatch: Dispatch function for client audit was delayed 2992840 ms ( 5000 ms) before being called (GSource: 0x1f114f0)Oct 27 10:22:09 inga heartbeat: [2313]: info: Gmain_timeout_dispatch: started at 1965715375 should have started at 1965416091Oct 27 10:22:09 inga heartbeat: [2313]: info: Link pitanga:eth2 up.harc[11433]: 2011/10/27_10:22:09 info: Running /etc/ha.d//rc.d/status statusmach_down[11475]: 2011/10/27_10:22:10 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquiredmach_down[11475]: 2011/10/27_10:22:10 info: mach_down takeover complete for node pitanga.Oct 27 10:22:10 inga heartbeat: [2313]: info: mach_down takeover complete.Oct 27 10:22:12 inga heartbeat: [11434]: info: Local Resource acquisition completed.harc[11534]: 2011/10/27_10:22:12 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-r ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Heartbeat 3.0.3 Message Warning
I keep getting the warning message as below heartbeat. This message is making me worried, since few month after this message the server has migrated the server to server2. Will and some bug when using Xen + DRBD + Heartbeat? DRBD is running on eth2, eth3 this Heartbeat. I changed the values #8203;#8203;of conf heartbeat, and still continues. Messages found on google for this message, but no solution. The core of the processor and memory is left and interface 1G root@inga:/home/vm# tail -f /var/log/ha-logMay 25 08:23:28 inga heartbeat: [26789]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 310 ms ( 50 ms) (GSource: 0x1dae330)May 25 08:28:26 inga heartbeat: [26789]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 90 ms ( 50 ms) (GSource: 0x1dae330)May 28 11:40:56 inga heartbeat: [26789]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 60 ms ( 50 ms) (GSource: 0x1dabae0) /etc/ha.d/ha.cfdebugfile /var/log/ha-debuglogfile /var/log/ha-loglogfacility local0keepalive 2deadtime 30warntime 15bcast eth2 # Linuxauto_failback onnode pitanganode inga/etc/ha.d/haresources (xendomains is script)inga xendomains::lobeira.cfg xendomains::jequitiba.cfg xendomains::munguba.cfg xendomains::buriti.cfg xendomains::jatoba.cfg xendomains::mangaba.cfg xendomains::cagaita.cfg xendomains::agroval.cfg xendomains::tigui.cfg /etc/ha.d/authkeys auth 11 crcThanks. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Heartbeat 3.0.3 Message Warning
I keep getting the warning message as below heartbeat. This message is making me worried, since few month after this message the server has migrated the server to server2. Will and some bug when using Xen + DRBD + Heartbeat? DRBD is running on eth2, eth3 this Heartbeat. I changed the values #8203;#8203;of conf heartbeat, and still continues. Messages found on google for this message, but no solution. The core of the processor and memory is left and interface 1G root@inga:/home/vm# tail -f /var/log/ha-logMay 25 08:23:28 inga heartbeat: [26789]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 310 ms ( 50 ms) (GSource: 0x1dae330)May 25 08:28:26 inga heartbeat: [26789]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 90 ms ( 50 ms) (GSource: 0x1dae330)May 28 11:40:56 inga heartbeat: [26789]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 60 ms ( 50 ms) (GSource: 0x1dabae0) /etc/ha.d/ha.cfdebugfile /var/log/ha-debuglogfile /var/log/ha-loglogfacility local0keepalive 2deadtime 30warntime 15bcast eth2 # Linuxauto_failback onnode pitanganode inga/etc/ha.d/haresources (xendomains is script)inga xendomains::lobeira.cfg xendomains::jequitiba.cfg xendomains::munguba.cfg xendomains::buriti.cfg xendomains::jatoba.cfg xendomains::mangaba.cfg xendomains::cagaita.cfg xendomains::agroval.cfg xendomains::tigui.cfg /etc/ha.d/authkeys auth 11 crcThanks. I keep getting the warning message as below heartbeat. This message is making me worried, since few month after this message the server has migrated the server to server2. Will and some bug when using Xen + DRBD + Heartbeat? DRBD is running on eth2, eth3 this Heartbeat. I changed the values #8203;#8203;of conf heartbeat, and still continues. Messages found on google for this message, but no solution. The core of the processor and memory is left and interface 1G root@inga:/home/vm# tail -f /var/log/ha-logMay 25 08:23:28 inga heartbeat: [26789]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 310 ms ( 50 ms) (GSource: 0x1dae330)May 25 08:28:26 inga heartbeat: [26789]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 90 ms ( 50 ms) (GSource: 0x1dae330)May 28 11:40:56 inga heartbeat: [26789]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 60 ms ( 50 ms) (GSource: 0x1dabae0) /etc/ha.d/ha.cfdebugfile /var/log/ha-debuglogfile /var/log/ha-loglogfacility local0keepalive 2deadtime 30warntime 15bcast eth2 # Linuxauto_failback onnode pitanganode inga/etc/ha.d/haresources (xendomains is script)inga xendomains::lobeira.cfg xendomains::jequitiba.cfg xendomains::munguba.cfg xendomains::buriti.cfg xendomains::jatoba.cfg xendomains::mangaba.cfg xendomains::cagaita.cfg xendomains::agroval.cfg xendomains::tigui.cfg /etc/ha.d/authkeys auth 11 crcThanks. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Problem WARN: Gmain_timeout_dispatch Again
Ok,Thank you. I'm trying to isolate the problem to the maximum, so I try to diagnose the problem. I've tried tools like sar iostat to check the system queries. But for now everything without problems That's probably OK. If you're really having a problem, it should ordinarily show it up before it causes a false failover. Then you can figure out if you want to raise your timeout or figure out what's causing the slow processing. On 05/14/2011 09:08 AM, gilmarli...@agrovale.com.br wrote: Thanks again. deadtime 30 and warntime 15 this good ? BUT also either make warntime smaller or deadtime larger...On 5/13/2011 7:48 PM, gilmarli...@agrovale.com.br wrote: Thank you for your attention. His recommendation and wait, if only to continue the logs I get following warning if the services do not migrate to another server just keep watching the logs warning.I typically make deadtime something like 3 times warntime. That way you'll get data before you get into trouble. When your heartbeats exceed warntime, you get information on how late it is. I would typically make deadtime AT LEAST twice the latest time you've ever seen with warntime. If the worst case you ever saw was this 60ms instead of 50ms, I'd look somewhere else for the problem. However, it is possible that you have a hardware trouble, or a kernel bug. Possible, but unlikely. More logs are always good when looking at a problem like this. hb_report will get you lots of logs and so on for the next time it happens. On 05/13/2011 11:44 AM, gilmarli...@agrovale.com.br wrote: Thanks for the help. I had a problem the 30 days that began with this post, and after two days the heartbeat message that the accused had fallen server1 and services migrated to server2 Now with this change to eth1 and eth2 for drbd and heartbeat to the amendment of warntime deadtime 20 to 15 and do not know if this will happen again. Thanks That's related to process dispatch time in the kernel. It might be thecase that this expectation is a bit aggressive (mea culpa). In the mean time, as long as those timings remain close to the expectations (60 vs 50ms) I'd ignore them. Those messages are meant to debug real-time problems - which you don'tappear to be having. -- Alan Robertson al...@unix.sh On 05/12/2011 12:54 PM, gilmarli...@agrovale.com.br wrote:Hello!I'm using heartbeat version 3.0.3-2 on debian squeeze with dedicatedgigabit ethernet interface for the heartbeat.But even this generates the following message:WARN: Gmain_timeout_dispatch: Dispatch function for send local statustook too long to execute: 60 ms ( 50 ms) (GSource: 0x101c350)I'm using eth1 to eth2 and to Synchronize DRBD(eth1) HEARBEAT (eth2). I tried increasing the values deadtime = 20 and 15 warntimeInterface Gigabit Ethernet controller: Intel Corporation 82575GBServ.1 and the Ethernet controller: Broadcom Corporation NetXtreme IIBCM5709 in Serv.2 Tested using two Broadcom for the heartbeat, also without success. Thanks -- ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Alan Robertsonal...@unix.sh Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Problem WARN: Gmain_timeout_dispatch Again
Thanks again. deadtime 30 and warntime 15 this good ? BUT also either make warntime smaller or deadtime larger... On 5/13/2011 7:48 PM, gilmarli...@agrovale.com.br wrote: Thank you for your attention. His recommendation and wait, if only to continue the logs I get following warning if the services do not migrate to another server just keep watching the logs warning. I typically make deadtime something like 3 times warntime. That way you'll get data before you get into trouble. When your heartbeats exceed warntime, you get information on how late it is. I would typically make deadtime AT LEAST twice the latest time you've ever seen with warntime. If the worst case you ever saw was this 60ms instead of 50ms, I'd look somewhere else for the problem. However, it is possible that you have a hardware trouble, or a kernel bug. Possible, but unlikely. More logs are always good when looking at a problem like this. hb_report will get you lots of logs and so on for the next time it happens. On 05/13/2011 11:44 AM, gilmarli...@agrovale.com.br wrote: Thanks for the help. I had a problem the 30 days that began with this post, and after two days the heartbeat message that the accused had fallen server1 and services migrated to server2 Now with this change to eth1 and eth2 for drbd and heartbeat to the amendment of warntime deadtime 20 to 15 and do not know if this will happen again. ThanksThat's related to process dispatch time in the kernel. It might be the case that this expectation is a bit aggressive (mea culpa). In the mean time, as long as those timings remain close to the expectations (60 vs 50ms) I'd ignore them. Those messages are meant to debug real-time problems - which you don't appear to be having. -- Alan Robertson al...@unix.sh On 05/12/2011 12:54 PM, gilmarli...@agrovale.com.br wrote: Hello! I'm using heartbeat version 3.0.3-2 on debian squeeze with dedicated gigabit ethernet interface for the heartbeat. But even this generates the following message: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 60 ms ( 50 ms) (GSource: 0x101c350) I'm using eth1 to eth2 and to Synchronize DRBD(eth1) HEARBEAT (eth2). I tried increasing the values deadtime = 20 and 15 warntime Interface Gigabit Ethernet controller: Intel Corporation 82575GB Serv.1 and the Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 in Serv.2 Tested using two Broadcom for the heartbeat, also without success. Thanks -- ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Problem WARN: Gmain_timeout_dispatch Again
Thanks for the help. I had a problem the 30 days that began with this post, and after two days the heartbeat message that the accused had fallen server1 and services migrated to server2 Now with this change to eth1 and eth2 for drbd and heartbeat to the amendment of warntime deadtime 20 to 15 and do not know if this will happen again.Thanks That's related to process dispatch time in the kernel. It might be the case that this expectation is a bit aggressive (mea culpa). In the mean time, as long as those timings remain close to the expectations (60 vs 50ms) I'd ignore them. Those messages are meant to debug real-time problems - which you don't appear to be having. -- Alan Robertson al...@unix.sh On 05/12/2011 12:54 PM, gilmarli...@agrovale.com.br wrote: Hello! I'm using heartbeat version 3.0.3-2 on debian squeeze with dedicated gigabit ethernet interface for the heartbeat. But even this generates the following message: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 60 ms ( 50 ms) (GSource: 0x101c350) I'm using eth1 to eth2 and to Synchronize DRBD(eth1) HEARBEAT (eth2). I tried increasing the values deadtime = 20 and 15 warntime Interface Gigabit Ethernet controller: Intel Corporation 82575GB Serv.1 and the Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 in Serv.2 Tested using two Broadcom for the heartbeat, also without success. Thanks -- ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Problem WARN: Gmain_timeout_dispatch Again
Thank you for your attention. His recommendation and wait, if only to continue the logs I get following warning if the services do not migrate to another server just keep watching the logs warning. I typically make deadtime something like 3 times warntime. That way you'll get data before you get into trouble. When your heartbeats exceed warntime, you get information on how late it is. I would typically make deadtime AT LEAST twice the latest time you've ever seen with warntime. If the worst case you ever saw was this 60ms instead of 50ms, I'd look somewhere else for the problem. However, it is possible that you have a hardware trouble, or a kernel bug. Possible, but unlikely. More logs are always good when looking at a problem like this. hb_report will get you lots of logs and so on for the next time it happens. On 05/13/2011 11:44 AM, gilmarli...@agrovale.com.br wrote: Thanks for the help. I had a problem the 30 days that began with this post, and after two days the heartbeat message that the accused had fallen server1 and services migrated to server2 Now with this change to eth1 and eth2 for drbd and heartbeat to the amendment of warntime deadtime 20 to 15 and do not know if this will happen again. Thanks That's related to process dispatch time in the kernel. It might be the case that this expectation is a bit aggressive (mea culpa). In the mean time, as long as those timings remain close to the expectations (60 vs 50ms) I'd ignore them. Those messages are meant to debug real-time problems - which you don't appear to be having. -- Alan Robertson al...@unix.shOn 05/12/2011 12:54 PM, gilmarli...@agrovale.com.br wrote: Hello! I'm using heartbeat version 3.0.3-2 on debian squeeze with dedicated gigabit ethernet interface for the heartbeat. But even this generates the following message: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 60 ms ( 50 ms) (GSource: 0x101c350) I'm using eth1 to eth2 and to Synchronize DRBD(eth1) HEARBEAT (eth2). I tried increasing the values deadtime = 20 and 15 warntime Interface Gigabit Ethernet controller: Intel Corporation 82575GB Serv.1 and the Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 in Serv.2 Tested using two Broadcom for the heartbeat, also without success. Thanks -- ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] WARN: Gmain_timeout_dispatch Log
Thanks for the help. More interesting than the drbd does not generate a log even with it all normal. The network cards are connected with a Gigabit broadcom cross cable. These logs are generated once in a while. Will attempt to set the parameters in sysctl.conf below concerning the network, what do you think?net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.core.rmem_default = 16777216 net.core.wmem_default = 16777216 Hi, On Wed, Apr 27, 2011 at 02:44:04PM -0300, gilmarli...@agrovale.com.br wrote: Hello, I am using drbd (two primary) + heartbeat (auto_failback on). In this Server1 has more hosts connected to this by presenting the following log: Version 3.0.3-2 heartbeat. I changed the values #8203;#8203;in / etc / ha.d / ha.cfg as below, but the problem continues keepalive 4 deadtime 20 warntime 15 inga root @: ~ # tail-f / var / log / ha-log Apr 27 07:37:55 inga heartbeat: [8495]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 100 ms ( 50 ms) (GSource: 0x74e350) Apr 27 13:11:43 inga heartbeat: [8495]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 60 ms ( 50 ms) (GSource: 0x74e350) Apr 27 13:12:02 inga heartbeat: [8495]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 70 ms ( 50 ms) (GSource: 0x74bac0) Apr 27 13:12:03 inga heartbeat: [8495]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 60 m s ( 50 ms) (GSource: 0x74bac0) This log worries me A few more days he appeared and the server eventually declared dead. Thanks This should indicate that this node has a high load and couldn't keep up with the demand. BTW, this kind of question belongs to the user mailing list. Thanks, Dejan P.S. Looks like you are really short on newlines over there. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] WARN: Gmain_timeout_dispatch Log
Hello, I am using drbd (two primary) + heartbeat (auto_failback on). In this Server1 has more hosts connected to this by presenting the following log: Version 3.0.3-2 heartbeat. I changed the values #8203;#8203;in / etc / ha.d / ha.cfg as below, but the problem continues keepalive 4 deadtime 20 warntime 15 inga root @: ~ # tail-f / var / log / ha-log Apr 27 07:37:55 inga heartbeat: [8495]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 100 ms ( 50 ms) (GSource: 0x74e350) Apr 27 13:11:43 inga heartbeat: [8495]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 60 ms ( 50 ms) (GSource: 0x74e350) Apr 27 13:12:02 inga heartbeat: [8495]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 70 ms ( 50 ms) (GSource: 0x74bac0) Apr 27 13:12:03 inga heartbeat: [8495]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 60 ms ( 50 ms) (GSource: 0x74bac0) This log worries me A few more days he appeared and the server eventually declared dead. Thanks ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] DRBD after Primary Configuration
I have a two-server as primary, but the second primary server crashed, set up another machine to put in the cluster again. But this second machine does not connect the primary server, and can not create the meta data because I can not lose the information.Anyone have any idea how to make the drbd Synchronize.Thanks ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/