Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
On 17/05/2013, at 1:15 PM, Andrew Widdersheim awiddersh...@hotmail.com wrote: I'm attaching 3 patches I made fairly quickly to fix the installation issues and also an issue I noticed with the ping ocf from the latest pacemaker. One is for cluster-glue to prevent lrmd from building and later installing. May also want to modify this patch to take lrmd out of both spec files included when you download the source if you plan to build an rpm. I'm not sure if what I did here is the best way to approach this problem so if anyone has anything better please let me know. One is for pacemaker to create the lrmd symlink when building with heartbeat support. I can't apply this one until the cluster-glue one is in common use. Otherwise rpm will instead refuse to install pacemaker because both it and cluster-glue contain the same file. Note the spec does not need anything changed here. Finally, saw the following errors in messages with the latest ping ocf and the attached patch seems to fix the issue. This is a slightly better fix: diff --git a/extra/resources/ping b/extra/resources/ping index abb631e..b9a69b8 100755 --- a/extra/resources/ping +++ b/extra/resources/ping @@ -305,6 +305,7 @@ ping_update() { : ${OCF_RESKEY_attempts:=3} : ${OCF_RESKEY_multiplier:=1} : ${OCF_RESKEY_debug:=false} +: ${OCF_RESKEY_failure_score:=0} : ${OCF_RESKEY_CRM_meta_timeout:=2} : ${OCF_RESKEY_CRM_meta_globally_unique:=true} May 16 01:10:13 node2 lrmd[16133]: notice: operation_finished: p_ping_monitor_5000:17758 [ /usr/lib/ocf/resource.d/pacemaker/ping: line 296: [: : integer expression expected ] cluster-glue-no-lrmd.patchpacemaker-lrmd-hb.patchpacemaker-ping-failure.patch___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
I'm just wondering: why is lrm gone? TIA! Nikita Michalko Am Freitag, 17. Mai 2013 05:15:10 schrieb Andrew Widdersheim: I'm attaching 3 patches I made fairly quickly to fix the installation issues and also an issue I noticed with the ping ocf from the latest pacemaker. One is for cluster-glue to prevent lrmd from building and later installing. May also want to modify this patch to take lrmd out of both spec files included when you download the source if you plan to build an rpm. I'm not sure if what I did here is the best way to approach this problem so if anyone has anything better please let me know. One is for pacemaker to create the lrmd symlink when building with heartbeat support. Note the spec does not need anything changed here. Finally, saw the following errors in messages with the latest ping ocf and the attached patch seems to fix the issue. May 16 01:10:13 node2 lrmd[16133]: notice: operation_finished: p_ping_monitor_5000:17758 [ /usr/lib/ocf/resource.d/pacemaker/ping: line 296: [: : integer expression expected ] ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
On 2013-05-17T14:15:00, Nikita Michalko michalko.sys...@a-i-p.com wrote: I'm just wondering: why is lrm gone? Rewritten by the pacemaker project upstream, which prefers to no longer build with cluster-glue at all. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
Just tried the patch you gave and it worked fine. Any plans on putting this patch in officially or was this a one off? Aside from this patch I guess the only thing to get things to work is to install things slightly differently and adding a symlink from cluster-glue's lrmd to pacemakers. Subject: Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7 From: and...@beekhof.net Date: Thu, 16 May 2013 15:20:59 +1000 CC: pacemaker@oss.clusterlabs.org To: awiddersh...@hotmail.com On 16/05/2013, at 3:16 PM, Andrew Widdersheim awiddersh...@hotmail.com wrote: I'll look into moving over to the cman option since that is preferred for RHEL6.4 now if I'm not mistaken. Correct I'll also try out the patch provided and see how that goes. So was LRMD not apart of pacemaker previously and later added? Was it originally apart of heartbeat/cluster-glue? I'm just trying to figure out all of the pieces so that I know how to fix if I choose to go down that road. Originally everything was part of heartbeat. Then what was then called the crm became pacemaker and the lrmd v1 became part of cluster-glue (because the theory was that someone might use it for a pacemaker alternative). That never happened and we stopped using almost everything else from cluster-glue, so when lrmd v2 was written, it was done so as part of pacemaker. or, tl;dr - yes and yes :) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
On 17/05/2013, at 11:38 AM, Andrew Widdersheim awiddersh...@hotmail.com wrote: Just tried the patch you gave and it worked fine. Any plans on putting this patch in officially or was this a one off? It will be in 1.1.10-rc3 soon Aside from this patch I guess the only thing to get things to work is to install things slightly differently and adding a symlink from cluster-glue's lrmd to pacemakers. Excellent Subject: Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7 From: and...@beekhof.net Date: Thu, 16 May 2013 15:20:59 +1000 CC: pacemaker@oss.clusterlabs.org To: awiddersh...@hotmail.com On 16/05/2013, at 3:16 PM, Andrew Widdersheim awiddersh...@hotmail.com wrote: I'll look into moving over to the cman option since that is preferred for RHEL6.4 now if I'm not mistaken. Correct I'll also try out the patch provided and see how that goes. So was LRMD not apart of pacemaker previously and later added? Was it originally apart of heartbeat/cluster-glue? I'm just trying to figure out all of the pieces so that I know how to fix if I choose to go down that road. Originally everything was part of heartbeat. Then what was then called the crm became pacemaker and the lrmd v1 became part of cluster-glue (because the theory was that someone might use it for a pacemaker alternative). That never happened and we stopped using almost everything else from cluster-glue, so when lrmd v2 was written, it was done so as part of pacemaker. or, tl;dr - yes and yes :) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
I'm attaching 3 patches I made fairly quickly to fix the installation issues and also an issue I noticed with the ping ocf from the latest pacemaker. One is for cluster-glue to prevent lrmd from building and later installing. May also want to modify this patch to take lrmd out of both spec files included when you download the source if you plan to build an rpm. I'm not sure if what I did here is the best way to approach this problem so if anyone has anything better please let me know. One is for pacemaker to create the lrmd symlink when building with heartbeat support. Note the spec does not need anything changed here. Finally, saw the following errors in messages with the latest ping ocf and the attached patch seems to fix the issue. May 16 01:10:13 node2 lrmd[16133]: notice: operation_finished: p_ping_monitor_5000:17758 [ /usr/lib/ocf/resource.d/pacemaker/ping: line 296: [: : integer expression expected ] cluster-glue-no-lrmd.patch Description: Binary data pacemaker-lrmd-hb.patch Description: Binary data pacemaker-ping-failure.patch Description: Binary data ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
I am running the following versions: pacemaker-1.1.10-rc2 cluster-glue-1.0.11 heartbeat-3.0.5 I was running pacemaker-1.1.6 and things were working fine but after updating to the latest I could not get pacemaker to start with the following message repeated in the logs: crmd[8456]: warning: do_lrm_control: Failed to sign on to the LRM 7 (30 max) times Here is strace output from the crmd process: 0.23 recvfrom(5, 0xc513f9, 2487, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.21 poll([{fd=5, events=0}], 1, 0) = 0 (Timeout) 0.000574 socket(PF_FILE, SOCK_STREAM, 0) = 6 0.42 fcntl(6, F_GETFD) = 0 0.25 fcntl(6, F_SETFD, FD_CLOEXEC) = 0 0.21 fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.55 connect(6, {sa_family=AF_FILE, path=@lrmd}, 110) = -1 ECONNREFUSED (Connection refused) 0.50 close(6) = 0 0.31 shutdown(4294967295, 2 /* send and receive */) = -1 EBADF (Bad file descriptor) 0.24 close(4294967295) = -1 EBADF (Bad file descriptor) 0.39 write(2, Could not establish lrmd connect..., 62) = 62 0.58 sendto(3, 28May 14 18:54:51 crmd[8456]: ..., 104, MSG_NOSIGNAL, NULL, 0) = 104 0.000327 times({tms_utime=0, tms_stime=1, tms_cutime=0, tms_cstime=0}) = 430616237 0.28 recvfrom(5, 0xc513f9, 2487, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.25 poll([{fd=5, events=0}], 1, 0) = 0 (Timeout) 0.26 recvfrom(5, 0xc513f9, 2487, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.23 poll([{fd=5, events=0}], 1, 0) = 0 (Timeout) 0.23 recvfrom(5, 0xc513f9, 2487, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.23 poll([{fd=5, events=0}], 1, 0) = 0 (Timeout) I'm not quite sure what the issue is. At first I thought it might have been some type of permissions issues but I'm not quite sure that is the case anymore. Any help would be appreciated. I can forward a long any more details to help in troubleshooting. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
- Original Message - From: Andrew Widdersheim awiddersh...@hotmail.com To: pacemaker@oss.clusterlabs.org Sent: Wednesday, May 15, 2013 7:53:56 AM Subject: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7 I am running the following versions: pacemaker-1.1.10-rc2 cluster-glue-1.0.11 heartbeat-3.0.5 what libqb version do you have? I was running pacemaker-1.1.6 and things were working fine but after updating to the latest I could not get pacemaker to start with the following message repeated in the logs: crmd[8456]: warning: do_lrm_control: Failed to sign on to the LRM 7 (30 max) times Here is strace output from the crmd process: 0.23 recvfrom(5, 0xc513f9, 2487, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.21 poll([{fd=5, events=0}], 1, 0) = 0 (Timeout) 0.000574 socket(PF_FILE, SOCK_STREAM, 0) = 6 0.42 fcntl(6, F_GETFD) = 0 0.25 fcntl(6, F_SETFD, FD_CLOEXEC) = 0 0.21 fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.55 connect(6, {sa_family=AF_FILE, path=@lrmd}, 110) = -1 ECONNREFUSED (Connection refused) 0.50 close(6) = 0 0.31 shutdown(4294967295, 2 /* send and receive */) = -1 EBADF (Bad file descriptor) 0.24 close(4294967295) = -1 EBADF (Bad file descriptor) 0.39 write(2, Could not establish lrmd connect..., 62) = 62 0.58 sendto(3, 28May 14 18:54:51 crmd[8456]: ..., 104, MSG_NOSIGNAL, NULL, 0) = 104 0.000327 times({tms_utime=0, tms_stime=1, tms_cutime=0, tms_cstime=0}) = 430616237 0.28 recvfrom(5, 0xc513f9, 2487, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.25 poll([{fd=5, events=0}], 1, 0) = 0 (Timeout) 0.26 recvfrom(5, 0xc513f9, 2487, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.23 poll([{fd=5, events=0}], 1, 0) = 0 (Timeout) 0.23 recvfrom(5, 0xc513f9, 2487, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 0.23 poll([{fd=5, events=0}], 1, 0) = 0 (Timeout) I'm not quite sure what the issue is. At first I thought it might have been some type of permissions issues but I'm not quite sure that is the case anymore. Any help would be appreciated. I can forward a long any more details to help in troubleshooting. Are there anything in the logs that indicate a problem with the lrmd component? Do you see lrmd listed in 'ps -axf' output? -- Vossel ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
These are the libqb versions: libqb-devel-0.14.2-3.el6.x86_64libqb-0.14.2-3.el6.x86_64 Here is a process listing where lrmd is running:[root@node1 ~]# ps auxwww | egrep heartbeat|pacemakerroot 9553 0.1 0.7 52420 7424 ?SLs May14 1:39 heartbeat: master control processroot 9556 0.0 0.7 52260 7264 ?SL May14 0:10 heartbeat: FIFO readerroot 9557 0.0 0.7 52256 7260 ?SL May14 1:01 heartbeat: write: mcast eth0root 9558 0.0 0.7 52256 7260 ?SL May14 0:14 heartbeat: read: mcast eth0root 9559 0.0 0.7 52256 7260 ?SL May14 0:23 heartbeat: write: bcast eth1root 9560 0.0 0.7 52256 7260 ?SL May14 0:13 heartbeat: read: bcast eth1498 9563 0.0 0.2 36908 2392 ? SMay14 0:10 /usr/lib64/heartbeat/ccm498 9564 0.0 1.0 85084 10704 ?SMay14 0:25 /usr/lib64/heartbeat/cibroot 9565 0.0 0.1 44588 1896 ?SMay14 0:04 /usr/lib64/heartbeat/lrmd -rroot 9566 0.0 0.3 83544 3988 ?SMay14 0:10 /usr/lib64/heartbeat/stonithd498 9567 0.0 0.3 78668 3248 ?S May14 0:10 /usr/lib64/heartbeat/attrd498 26534 0.0 0.3 92364 3748 ? S16:05 0:00 /usr/lib64/heartbeat/crmd498 26535 0.0 0.2 72840 2708 ?S16:05 0:00 /usr/libexec/pacemaker/pengine Here are the logs at startup until the Failed to sign on message just starts to repeat over and over:May 15 16:07:06 node1 crmd[26621]: notice: main: CRM Git Version: b060caeMay 15 16:07:06 node1 attrd[26620]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeatMay 15 16:07:06 node1 attrd[26620]: notice: main: Starting mainloop...May 15 16:07:06 node1 stonith-ng[26619]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeatMay 15 16:07:06 node1 cib[26617]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeatMay 15 16:07:06 node1 lrmd: [26618]: WARN: Initializing connection to logging daemon failed. Logging daemon may not be runningMay 15 16:07:06 node1 lrmd: [26618]: info: max-children set to 4 (1 processors online)May 15 16:07:06 node1 lrmd: [26618]: info: enabling coredumpsMay 15 16:07:06 node1 lrmd: [26618]: info: Started.May 15 16:07:06 node1 cib[26617]: warning: ccm_connect: CCM Activation failedMay 15 16:07:06 node1 cib[26617]: warning: ccm_connect: CCM Connection failed 1 times (30 max)May 15 16:07:06 node1 ccm: [26616]: WARN: Initializing connection to logging daemon failed. Logging daemon may not be runningMay 15 16:07:06 node1 ccm: [26616]: info: Hostname: node1May 15 16:07:07 node1 crmd[26621]: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retryMay 15 16:07:09 node1 cib[26617]: warning: ccm_connect: CCM Activation failedMay 15 16:07:09 node1 cib[26617]: warning: ccm_connect: CCM Connection failed 2 times (30 max)May 15 16:07:10 node1 crmd[26621]: warning: do_cib_control: Couldn't complete CIB registration 2 times... pause and retryMay 15 16:07:13 node1 crmd[26621]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeatMay 15 16:07:14 node1 cib[26617]: notice: crm_update_peer_state: crm_update_ccm_node: Node node2[1] - state is now member (was (null))May 15 16:07:14 node1 cib[26617]: notice: crm_update_peer_state: crm_update_ccm_node: Node node1[0] - state is now member (was (null))May 15 16:07:15 node1 crmd[26621]: warning: do_lrm_control: Failed to sign on to the LRM 1 (30 max) times Here is the repeating message peices:May 15 16:06:09 node1 crmd[26534]: error: do_lrm_control: Failed to sign on to the LRM 30 (max) timesMay 15 16:06:09 node1 crmd[26534]:error: do_log: FSA: Input I_ERROR from do_lrm_control() received in state S_STARTINGMay 15 16:06:09 node1 crmd[26534]: warning: do_state_transition: State transition S_STARTING - S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=do_lrm_control ]May 15 16:06:09 node1 crmd[26534]: warning: do_recover: Fast-tracking shutdown in response to errorsMay 15 16:06:09 node1 crmd[26534]:error: do_started: Start cancelled... S_RECOVERYMay 15 16:06:09 node1 crmd[26534]:error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERYMay 15 16:06:09 node1 crmd[26534]: notice: do_lrm_control: Disconnected from the LRMMay 15 16:06:09 node1 ccm: [9563]: info: client (pid=26534) removed from ccmMay 15 16:06:09 node1 crmd[26534]:error: do_exit: Could not recover from internal errorMay 15 16:06:09 node1 crmd[26534]:error: crm_abort: crm_glib_handler: Forked child 26540 to record non-fatal assert at logging.c:63 : g_hash_table_size: assertion `hash_table != NULL' failedMay 15 16:06:09 node1 crmd[26534]:error: crm_abort: crm_glib_handler: Forked child 26541 to record non-fatal assert at logging.c:63 : g_hash_table_destroy: assertion `hash_table
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
On 16/05/2013, at 10:21 AM, Andrew Widdersheim awiddersh...@hotmail.com wrote: These are the libqb versions: libqb-devel-0.14.2-3.el6.x86_64 libqb-0.14.2-3.el6.x86_64 Here is a process listing where lrmd is running: [root@node1 ~]# ps auxwww | egrep heartbeat|pacemaker root 9553 0.1 0.7 52420 7424 ?SLs May14 1:39 heartbeat: master control process root 9556 0.0 0.7 52260 7264 ?SL May14 0:10 heartbeat: FIFO reader root 9557 0.0 0.7 52256 7260 ?SL May14 1:01 heartbeat: write: mcast eth0 root 9558 0.0 0.7 52256 7260 ?SL May14 0:14 heartbeat: read: mcast eth0 root 9559 0.0 0.7 52256 7260 ?SL May14 0:23 heartbeat: write: bcast eth1 root 9560 0.0 0.7 52256 7260 ?SL May14 0:13 heartbeat: read: bcast eth1 498 9563 0.0 0.2 36908 2392 ?SMay14 0:10 /usr/lib64/heartbeat/ccm 498 9564 0.0 1.0 85084 10704 ?SMay14 0:25 /usr/lib64/heartbeat/cib root 9565 0.0 0.1 44588 1896 ?SMay14 0:04 /usr/lib64/heartbeat/lrmd -r Heartbeat is starting the wrong lrmd by the looks of it. Is /usr/lib64/heartbeat/lrmd the same as /usr/libexec/pacemaker/lrmd ? root 9566 0.0 0.3 83544 3988 ?SMay14 0:10 /usr/lib64/heartbeat/stonithd 498 9567 0.0 0.3 78668 3248 ?SMay14 0:10 /usr/lib64/heartbeat/attrd 498 26534 0.0 0.3 92364 3748 ?S16:05 0:00 /usr/lib64/heartbeat/crmd 498 26535 0.0 0.2 72840 2708 ?S16:05 0:00 /usr/libexec/pacemaker/pengine Here are the logs at startup until the Failed to sign on message just starts to repeat over and over: May 15 16:07:06 node1 crmd[26621]: notice: main: CRM Git Version: b060cae May 15 16:07:06 node1 attrd[26620]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat May 15 16:07:06 node1 attrd[26620]: notice: main: Starting mainloop... May 15 16:07:06 node1 stonith-ng[26619]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat May 15 16:07:06 node1 cib[26617]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat May 15 16:07:06 node1 lrmd: [26618]: WARN: Initializing connection to logging daemon failed. Logging daemon may not be running May 15 16:07:06 node1 lrmd: [26618]: info: max-children set to 4 (1 processors online) May 15 16:07:06 node1 lrmd: [26618]: info: enabling coredumps May 15 16:07:06 node1 lrmd: [26618]: info: Started. May 15 16:07:06 node1 cib[26617]: warning: ccm_connect: CCM Activation failed May 15 16:07:06 node1 cib[26617]: warning: ccm_connect: CCM Connection failed 1 times (30 max) May 15 16:07:06 node1 ccm: [26616]: WARN: Initializing connection to logging daemon failed. Logging daemon may not be running May 15 16:07:06 node1 ccm: [26616]: info: Hostname: node1 May 15 16:07:07 node1 crmd[26621]: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry May 15 16:07:09 node1 cib[26617]: warning: ccm_connect: CCM Activation failed May 15 16:07:09 node1 cib[26617]: warning: ccm_connect: CCM Connection failed 2 times (30 max) May 15 16:07:10 node1 crmd[26621]: warning: do_cib_control: Couldn't complete CIB registration 2 times... pause and retry May 15 16:07:13 node1 crmd[26621]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat May 15 16:07:14 node1 cib[26617]: notice: crm_update_peer_state: crm_update_ccm_node: Node node2[1] - state is now member (was (null)) May 15 16:07:14 node1 cib[26617]: notice: crm_update_peer_state: crm_update_ccm_node: Node node1[0] - state is now member (was (null)) May 15 16:07:15 node1 crmd[26621]: warning: do_lrm_control: Failed to sign on to the LRM 1 (30 max) times Here is the repeating message peices: May 15 16:06:09 node1 crmd[26534]:error: do_lrm_control: Failed to sign on to the LRM 30 (max) times May 15 16:06:09 node1 crmd[26534]:error: do_log: FSA: Input I_ERROR from do_lrm_control() received in state S_STARTING May 15 16:06:09 node1 crmd[26534]: warning: do_state_transition: State transition S_STARTING - S_RECOVERY [ input=I_ERROR cause=C_FSA _INTERNAL origin=do_lrm_control ] May 15 16:06:09 node1 crmd[26534]: warning: do_recover: Fast-tracking shutdown in response to errors May 15 16:06:09 node1 crmd[26534]:error: do_started: Start cancelled... S_RECOVERY May 15 16:06:09 node1 crmd[26534]:error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY May 15 16:06:09 node1 crmd[26534]: notice: do_lrm_control: Disconnected from the LRM May 15 16:06:09 node1 ccm: [9563]: info: client (pid=26534) removed from ccm May 15 16:06:09 node1 crmd[26534]:error: do_exit: Could not recover from internal error May 15 16:06:09 node1 crmd[26534]:error:
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
There are quite a few symlinks of heartbeat pieces back to pacemaker pieces like crmd as an example but lrmd was not one of them: [root@node1 ~]# ls -lha /usr/lib64/heartbeat/crmdlrwxrwxrwx 1 root root 27 May 14 17:31 /usr/lib64/heartbeat/crmd - /usr/libexec/pacemaker/crmd [root@node1 ~]# ls -lha /usr/lib64/heartbeat/lrmd-rwxr-xr-x 1 root root 85K May 14 17:19 /usr/lib64/heartbeat/lrmd I just tried to symlink it back by hand but when I started heartbeat the logs had nothing about lrmd starting/trying to start nor did lrmd show in the process list anymore. Just more failure messages. [root@node1 ~]# ls -lha /usr/lib64/heartbeat/lrmdlrwxrwxrwx 1 root root 27 May 15 19:38 /usr/lib64/heartbeat/lrmd - /usr/libexec/pacemaker/lrmd I then started lrmd manually as root with the verbose option turned on and looks like things started to connect and the cluster on node1 where I started lrmd manually began coming online and work a bit. I noticed when running pacemakers lrmd there is no longer a -r option which looking at my old ps command was how it was getting started: [root@node1 ~]# /usr/libexec/pacemaker/lrmd --helplrmd - Pacemaker Remote daemon for extending pacemaker functionality to remote nodes.Usage: lrmd [options]Options: -?, --help This text -$, --version Version information -V, --verbose Increase debug output -l, --logfile=valueSend logs to the additional named logfile This is what heartbeat's lrmd looks like. [root@node1 ~]# /usr/lib64/heartbeat/lrmd.bak --help/usr/lib64/heartbeat/lrmd.bak: invalid option -- '-'usage: lrmd [-srkhv] s: statusr: restartk: killm: register to apphbd i: the interval of apphbh: helpv: debug Previous ps output:root 9565 0.0 0.1 44588 1896 ?SMay14 0:04 /usr/lib64/heartbeat/lrmd -r I'm not sure what initially tries to spawn lrmd but it is likely that will need to change as well. Is all of this the result of a bad installation or did I need to compile things differently or is pacemaker too new and heartbeat too old? Basically, what do I need to do to fix. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
On 16/05/2013, at 2:03 PM, Andrew Widdersheim awiddersh...@hotmail.com wrote: There are quite a few symlinks of heartbeat pieces back to pacemaker pieces like crmd as an example but lrmd was not one of them: [root@node1 ~]# ls -lha /usr/lib64/heartbeat/crmd lrwxrwxrwx 1 root root 27 May 14 17:31 /usr/lib64/heartbeat/crmd - /usr/libexec/pacemaker/crmd [root@node1 ~]# ls -lha /usr/lib64/heartbeat/lrmd -rwxr-xr-x 1 root root 85K May 14 17:19 /usr/lib64/heartbeat/lrmd I just tried to symlink it back by hand but when I started heartbeat the logs had nothing about lrmd starting/trying to start nor did lrmd show in the process list anymore. Just more failure messages. [root@node1 ~]# ls -lha /usr/lib64/heartbeat/lrmd lrwxrwxrwx 1 root root 27 May 15 19:38 /usr/lib64/heartbeat/lrmd - /usr/libexec/pacemaker/lrmd I then started lrmd manually as root with the verbose option turned on and looks like things started to connect and the cluster on node1 where I started lrmd manually began coming online and work a bit. I noticed when running pacemakers lrmd there is no longer a -r option which looking at my old ps command was how it was getting started: [root@node1 ~]# /usr/libexec/pacemaker/lrmd --help lrmd - Pacemaker Remote daemon for extending pacemaker functionality to remote nodes. Usage: lrmd [options] Options: -?, --help This text -$, --version Version information -V, --verbose Increase debug output -l, --logfile=valueSend logs to the additional named logfile This is what heartbeat's lrmd looks like. [root@node1 ~]# /usr/lib64/heartbeat/lrmd.bak --help /usr/lib64/heartbeat/lrmd.bak: invalid option -- '-' usage: lrmd [-srkhv] s: status r: restart k: kill m: register to apphbd i: the interval of apphb h: help v: debug Previous ps output: root 9565 0.0 0.1 44588 1896 ?SMay14 0:04 /usr/lib64/heartbeat/lrmd -r I'm not sure what initially tries to spawn lrmd In your case, Heartbeat. but it is likely that will need to change as well. Is all of this the result of a bad installation or did I need to compile things differently or is pacemaker too new and heartbeat too old? Basically, what do I need to do to fix. Honestly, I'd probably recommend to just stop fighting the distro you're on :-) Just follow http://clusterlabs.org/quickstart-redhat.html to get what comes with and was tested for RHEL 6.4 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
On 16/05/2013, at 2:52 PM, Andrew Beekhof and...@beekhof.net wrote: On 16/05/2013, at 2:03 PM, Andrew Widdersheim awiddersh...@hotmail.com wrote: There are quite a few symlinks of heartbeat pieces back to pacemaker pieces like crmd as an example but lrmd was not one of them: [root@node1 ~]# ls -lha /usr/lib64/heartbeat/crmd lrwxrwxrwx 1 root root 27 May 14 17:31 /usr/lib64/heartbeat/crmd - /usr/libexec/pacemaker/crmd [root@node1 ~]# ls -lha /usr/lib64/heartbeat/lrmd -rwxr-xr-x 1 root root 85K May 14 17:19 /usr/lib64/heartbeat/lrmd I just tried to symlink it back by hand but when I started heartbeat the logs had nothing about lrmd starting/trying to start nor did lrmd show in the process list anymore. Just more failure messages. [root@node1 ~]# ls -lha /usr/lib64/heartbeat/lrmd lrwxrwxrwx 1 root root 27 May 15 19:38 /usr/lib64/heartbeat/lrmd - /usr/libexec/pacemaker/lrmd I then started lrmd manually as root with the verbose option turned on and looks like things started to connect and the cluster on node1 where I started lrmd manually began coming online and work a bit. I noticed when running pacemakers lrmd there is no longer a -r option which looking at my old ps command was how it was getting started: [root@node1 ~]# /usr/libexec/pacemaker/lrmd --help lrmd - Pacemaker Remote daemon for extending pacemaker functionality to remote nodes. Usage: lrmd [options] Options: -?, --help This text -$, --version Version information -V, --verbose Increase debug output -l, --logfile=valueSend logs to the additional named logfile This is what heartbeat's lrmd looks like. [root@node1 ~]# /usr/lib64/heartbeat/lrmd.bak --help /usr/lib64/heartbeat/lrmd.bak: invalid option -- '-' usage: lrmd [-srkhv] s: status r: restart k: kill m: register to apphbd i: the interval of apphb h: help v: debug Previous ps output: root 9565 0.0 0.1 44588 1896 ?SMay14 0:04 /usr/lib64/heartbeat/lrmd -r I'm not sure what initially tries to spawn lrmd In your case, Heartbeat. but it is likely that will need to change as well. Is all of this the result of a bad installation or did I need to compile things differently or is pacemaker too new and heartbeat too old? Basically, what do I need to do to fix. Honestly, I'd probably recommend to just stop fighting the distro you're on :-) Just follow http://clusterlabs.org/quickstart-redhat.html to get what comes with and was tested for RHEL 6.4 Although building with this patch would probably help: https://github.com/beekhof/pacemaker/commit/064b19e ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
I'll look into moving over to the cman option since that is preferred for RHEL6.4 now if I'm not mistaken. I'll also try out the patch provided and see how that goes. So was LRMD not apart of pacemaker previously and later added? Was it originally apart of heartbeat/cluster-glue? I'm just trying to figure out all of the pieces so that I know how to fix if I choose to go down that road. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7
On 16/05/2013, at 3:16 PM, Andrew Widdersheim awiddersh...@hotmail.com wrote: I'll look into moving over to the cman option since that is preferred for RHEL6.4 now if I'm not mistaken. Correct I'll also try out the patch provided and see how that goes. So was LRMD not apart of pacemaker previously and later added? Was it originally apart of heartbeat/cluster-glue? I'm just trying to figure out all of the pieces so that I know how to fix if I choose to go down that road. Originally everything was part of heartbeat. Then what was then called the crm became pacemaker and the lrmd v1 became part of cluster-glue (because the theory was that someone might use it for a pacemaker alternative). That never happened and we stopped using almost everything else from cluster-glue, so when lrmd v2 was written, it was done so as part of pacemaker. or, tl;dr - yes and yes :) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org