[Pacemaker] Master Slave
I have a quick question is the Master Slave setting in pacemaker only allowed in regards to a DRBD device? Can you use it to create other Master Slave relationships? Does all resource agents potentially involved in this need to be aware of the Master Slave relationship? I am trying to set up a pair fo mysql servers One is replicating from the other(handled within mysql's my.cnf.) I basically want to fail over the VIP of the primary node to the secondary node(which also happens to be the mysql slave) in the event that the primary has its mysql server stopped. I am not using DRBD at all. My config looks like the following. node $id="0cd2bb09-00b6-4ce4-bdd1-629767ae0739" sipl-mysql-109 node $id="119fc082-7046-4b8d-a9a3-7e777b9ddf60" sipl-mysql-209 primitive p_clusterip ocf:heartbeat:IPaddr2 \ params ip="10.200.131.9" cidr_netmask="32" \ op monitor interval="30s" primitive p_mysql ocf:heartbeat:mysql \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ op monitor interval="10" timeout="120" depth="0" ms ms_mysql p_mysql \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" location l_master ms_mysql \ rule $id="l_master-rule" $role="Master" 100: #uname eq sipl-mysql-109 colocation mysql_master_on_ip inf: p_clusterip ms_mysql:Master property $id="cib-bootstrap-options" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ start-failure-is-fatal="false" \ expected-quorum-votes="2" \ symmetric-cluster="false" \ dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ cluster-infrastructure="Heartbeat" rsc_defaults $id="rsc-options" \ resource-stickiness="100" What's happening is that mysql is never brought up due to the following errors: ul 22 16:15:07 sipl-mysql-109 pengine: [22890]: info: native_color: Resource p_mysql:0 cannot run anywhere Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: info: native_color: Resource p_mysql:1 cannot run anywhere Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: info: native_merge_weights: ms_mysql: Rolling back scores from p_clusterip Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: info: master_color: ms_mysql: Promoted 0 instances of a possible 1 to master Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: info: native_color: Resource p_clusterip cannot run anywhere Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: info: master_color: ms_mysql: Promoted 0 instances of a possible 1 to master Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: notice: LogActions: Leave resource p_clusterip (Stopped) Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: notice: LogActions: Leave resource p_mysql:0 (Stopped) Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: notice: LogActions: Leave resource p_mysql:1 (Stopped) I thought I may have overcome this with my location and colocation directive but it failed. Could someone give me some feedback on what I am trying to do, my config and the resulting errors? Thanks F. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] mysql RA constantly restarting db
Hello, I am new to pacemaker and struggling with the somewhat limited documentation. I looked through the archives and didn't find anything that matched my problem. I have brand new pacemaker setup running on CentOS 5.5. I am using the below config file to start up mysql which is also a brand new build. Right now the cluster is only running on one node while I try to isolate this problem. This is a brand new cib file as well. The cluster starts up but then every 30 seconds or so I see it restart mysql. If I stop heartbeat and bring up mysql by itself it starts up just fine. Its driving me batty so I thought I would post it here and see if someone was able to help. What I see in syslog from heartbeat is: ul 22 15:34:09 sipl-mysql-109 lrmd: [11182]: info: rsc:d_mysql:69: start Jul 22 15:34:11 sipl-mysql-109 lrmd: [11182]: info: RA output: (ip_db:start:stderr) ARPING 10.200.131.9 from 10.200.131.9 eth0 Sent 5 probes (5 broadcast(s)) Received 0 response(s) Jul 22 15:34:13 sipl-mysql-109 mysql[14915]: [15086]: INFO: MySQL started Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: process_lrm_event: LRM operation d_mysql_start_0 (call=69, rc=0, cib-update=105, confirmed=true) ok Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: match_graph_event: Action d_mysql_start_0 (6) confirmed on sipl-mysql-109 (rc=0) Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: te_rsc_command: Initiating action 1: monitor d_mysql_monitor_1 on sipl-mysql-109 (local) Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: do_lrm_rsc_op: Performing key=1:14:0:989206b7-461a-42db-a2a7-7b447bd6c5b3 op=d_mysql_monitor_1 ) Jul 22 15:34:13 sipl-mysql-109 lrmd: [11182]: info: rsc:d_mysql:70: monitor Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: te_rsc_command: Initiating action 8: start ip_db_start_0 on sipl-mysql-109 (local) Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: do_lrm_rsc_op: Performing key=8:14:0:989206b7-461a-42db-a2a7-7b447bd6c5b3 op=ip_db_start_0 ) Jul 22 15:34:13 sipl-mysql-109 lrmd: [11182]: info: rsc:ip_db:71: start Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: process_lrm_event: LRM operation d_mysql_monitor_1 (call=70, rc=7, cib-update=106, confirmed=false) not running Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: WARN: status_from_rc: Action 1 (d_mysql_monitor_1) on sipl-mysql-109 failed (target: 0 vs. rc: 7): Error Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: WARN: update_failcount: Updating failcount for d_mysql on sipl-mysql-109 after failed monitor. The output of crm configure show is: primitive d_mysql ocf:heartbeat:mysql \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ op monitor interval="10" timeout="30" depth="0" param binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" datadir="/var/lib/mysql" user="mysql" pid="/var/run/mysqld/mysql.pid" socket="/var/lib/mysql/mysql.sock" primitive ip_db ocf:heartbeat:IPaddr2 \ params ip="10.200.131.9" cidr_netmask="32" \ op monitor interval="30s" nic="eth0" group sv_db d_mysql ip_db property $id="cib-bootstrap-options" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ start-failure-is-fatal="false" \ expected-quorum-votes="2" \ dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ cluster-infrastructure="Heartbeat" rsc_defaults $id="rsc_defaults-options" \ migration-threshold="20" \ failure-timeout="20" My versions are as follows: [r...@sipl-mysql-109 rc0.d]# rpm -qa | egrep "coro|pacemaker|heart" corosynclib-1.2.5-1.3.el5 corosync-1.2.5-1.3.el5 corosync-1.2.5-1.3.el5 heartbeat-3.0.3-2.3.el5 pacemaker-1.0.9.1-1.11.el5 pacemaker-1.0.9.1-1.11.el5 corosynclib-1.2.5-1.3.el5 heartbeat-libs-3.0.3-2.3.el5 heartbeat-3.0.3-2.3.el5 pacemaker-libs-1.0.9.1-1.11.el5 heartbeat-libs-3.0.3-2.3.el5 pacemaker-libs-1.0.9.1-1.11.el5 rpm -qa | grep resource resource-agents-1.0.3-2.6.el5 [r...@sipl-mysql-109 rc0.d]# cat /etc/redhat-release CentOS release 5.5 (Final) [r...@sipl-mysql-109 rc0.d]# uname -r 2.6.18-194.8.1.el5 [r...@sipl-mysql-109 rc0.d]# mysql -V mysql Ver 14.14 Distrib 5.1.48, for unknown-linux-gnu (x86_64) using readline 5.1 My ha.cf looks like: autojoin none mcast eth0 227.0.0.10 694 1 0 warntime 5 deadtime 15 initdead 60 keepalive 5 auto_failback off node sipl-mysql-109 node sipl-mysql-209 crm on Mysql show the following in it's error log: 100722 15:33:57 [Note] Plugin 'FEDERATED' is disabled. 100722 15:33:57 InnoDB: Started; log sequence number 0 44233 100722 15:33:57 [Note] Event Scheduler: Loaded 0 events 100722 15:33:57 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.1.48-community-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server (GPL) 100722 15:34:01 [Note] /usr/sbin/mysqld: Normal shutdown 100722 15:34:01 [Note] Event Scheduler: Purging the queue. 0 events 100722 15:34:01 InnoDB: Starting shutdown... 100722 15:34:02 InnoDB: Shutdown completed; log sequence number 0 44233 100722 15:34:02 [Note] /usr/sbin/mysqld: Shutdown complete 100722 15:34
Re: [Pacemaker] FS mount error
On Thu, Jul 22, 2010 at 10:36 AM, Proskurin Kirill wrote: > On 22/07/10 12:23, Michael Fung wrote: >> >> crm resource cleanup WebFS > > That not help. > > node01:~# crm resource cleanup WebFS > Cleaning up WebFS on mail02.fxclub.org > Cleaning up WebFS on mail01.fxclub.org > > Jul 22 09:33:24 node01 crm_resource: [3442]: info: Invoked: crm_resource -C > -r WebFS -H node01.domain.org > Jul 22 09:33:25 node01 crmd: [1814]: ERROR: stonithd_signon: Can't initiate > connection to stonithd > Jul 22 09:33:25 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 09:33:25 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry Looks like a bad install > Jul 22 09:33:25 node01 crmd: [1814]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Jul 22 09:33:25 node01 crmd: [1814]: info: unpack_graph: Unpacked transition > 647: 6 actions in 6 synapses > Jul 22 09:33:25 node01 crmd: [1814]: info: do_te_invoke: Processing graph > 647 (ref=pe_calc-dc-1279787604-2520) derived from > /var/lib/pengine/pe-input-691.bz2 > Jul 22 09:33:25 node01 crmd: [1814]: info: te_rsc_command: Initiating action > 2: stop WebFS_stop_0 on node02.domain.org > Jul 22 09:33:25 node01 crmd: [1814]: info: te_rsc_command: Initiating action > 6: probe_complete probe_complete on node02.domain.org - no waiting > > ... > > Jul 22 09:33:32 node01 crmd: [1814]: WARN: status_from_rc: Action 43 > (WebFS_start_0) on node02.domain.org failed (target: 0 vs. rc: 1): Error > Jul 22 09:33:32 node01 crmd: [1814]: WARN: update_failcount: Updating > failcount for WebFS on node02.domain.org after failed start: rc=1 > (update=INFINITY, time=1279787612) > Jul 22 09:33:32 node01 crmd: [1814]: info: abort_transition_graph: > match_graph_event:272 - Triggered transition abort (complete=0, > tag=lrm_rsc_op, id=WebFS_start_0, > magic=0:1;43:647:0:882b3ca6-0496-4e26-9137-0a10d6ce57e4, cib=0.144.897) : > Event failed > Jul 22 09:33:32 node01 crmd: [1814]: info: update_abort_priority: Abort > priority upgraded from 0 to 1 > Jul 22 09:33:32 node01 crmd: [1814]: info: update_abort_priority: Abort > action done superceeded by restart > Jul 22 09:33:32 node01 crmd: [1814]: info: match_graph_event: Action > WebFS_start_0 (43) confirmed on node02.domain.org (rc=4) > Jul 22 09:33:32 node01 crmd: [1814]: info: run_graph: > > Jul 22 09:33:32 node01 crmd: [1814]: notice: run_graph: Transition 647 > (Complete=4, Pending=0, Fired=0, Skipped=2, Incomplete=0, > Source=/var/lib/pengine/pe-input-691.bz2): Stopped > Jul 22 09:33:32 node01 crmd: [1814]: info: te_graph_trigger: Transition 647 > is now complete > > > -- > Best regards, > Proskurin Kirill > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] FS mount error
On 22/07/10 12:23, Michael Fung wrote: crm resource cleanup WebFS That not help. node01:~# crm resource cleanup WebFS Cleaning up WebFS on mail02.fxclub.org Cleaning up WebFS on mail01.fxclub.org Jul 22 09:33:24 node01 crm_resource: [3442]: info: Invoked: crm_resource -C -r WebFS -H node01.domain.org Jul 22 09:33:25 node01 crmd: [1814]: ERROR: stonithd_signon: Can't initiate connection to stonithd Jul 22 09:33:25 node01 crmd: [1814]: notice: Not currently connected. Jul 22 09:33:25 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in failed: triggered a retry Jul 22 09:33:25 node01 crmd: [1814]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jul 22 09:33:25 node01 crmd: [1814]: info: unpack_graph: Unpacked transition 647: 6 actions in 6 synapses Jul 22 09:33:25 node01 crmd: [1814]: info: do_te_invoke: Processing graph 647 (ref=pe_calc-dc-1279787604-2520) derived from /var/lib/pengine/pe-input-691.bz2 Jul 22 09:33:25 node01 crmd: [1814]: info: te_rsc_command: Initiating action 2: stop WebFS_stop_0 on node02.domain.org Jul 22 09:33:25 node01 crmd: [1814]: info: te_rsc_command: Initiating action 6: probe_complete probe_complete on node02.domain.org - no waiting ... Jul 22 09:33:32 node01 crmd: [1814]: WARN: status_from_rc: Action 43 (WebFS_start_0) on node02.domain.org failed (target: 0 vs. rc: 1): Error Jul 22 09:33:32 node01 crmd: [1814]: WARN: update_failcount: Updating failcount for WebFS on node02.domain.org after failed start: rc=1 (update=INFINITY, time=1279787612) Jul 22 09:33:32 node01 crmd: [1814]: info: abort_transition_graph: match_graph_event:272 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=WebFS_start_0, magic=0:1;43:647:0:882b3ca6-0496-4e26-9137-0a10d6ce57e4, cib=0.144.897) : Event failed Jul 22 09:33:32 node01 crmd: [1814]: info: update_abort_priority: Abort priority upgraded from 0 to 1 Jul 22 09:33:32 node01 crmd: [1814]: info: update_abort_priority: Abort action done superceeded by restart Jul 22 09:33:32 node01 crmd: [1814]: info: match_graph_event: Action WebFS_start_0 (43) confirmed on node02.domain.org (rc=4) Jul 22 09:33:32 node01 crmd: [1814]: info: run_graph: Jul 22 09:33:32 node01 crmd: [1814]: notice: run_graph: Transition 647 (Complete=4, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pengine/pe-input-691.bz2): Stopped Jul 22 09:33:32 node01 crmd: [1814]: info: te_graph_trigger: Transition 647 is now complete -- Best regards, Proskurin Kirill ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] FS mount error
Please try: # crm resource cleanup WebFS This will fix if resource's fail-count reached INFINITY. Rgds, Michael On 2010/7/22 下午 03:29, Proskurin Kirill wrote: > Hello all. > > I really new to Pacemaker and try to make some test and learn how it is > all works. I use Clusters From Scratch pdf from clusterlabs.org as how-to. > > What we have: > Debian Lenny 5.0.5 (with kernel 2.6.32-bpo.4-amd64 from backports) > pacemaker 1.0.8+hg15494-4~bpo50+1 > openais 1.1.2-2~bpo50+1 > > > Problem: > I try to add fs mount resource but get unknown error. If I mount it by > hands - all is ok. > > crm_mon: > > > Last updated: Thu Jul 22 08:22:20 2010 > Stack: openais > Current DC: node01.domain.org - partition with quorum > Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd > 2 Nodes configured, 2 expected votes > 4 Resources configured. > > > Online: [ node02.domain.org node01.domain.org ] > > ClusterIP (ocf::heartbeat:IPaddr2): Started node02.domain.org > Master/Slave Set: WebData > Masters: [ node02.domain.org ] > Slaves: [ node01.domain.org ] > WebFS (ocf::heartbeat:Filesystem):Started node02.domain.org FAILED > > Failed actions: > WebFS_start_0 (node=node01.domain.org, call=18, rc=1, > status=complete): unknown error > WebFS_start_0 (node=node02.domain.org, call=301, rc=1, > status=complete): unknown error > > node01:~# crm_verify -VL > crm_verify[1482]: 2010/07/22_08:28:13 WARN: unpack_rsc_op: Processing > failed op WebFS_start_0 on node01.domain.org: unknown error (1) > crm_verify[1482]: 2010/07/22_08:28:13 WARN: unpack_rsc_op: Processing > failed op WebFS_start_0 on node02.domain.org: unknown error (1) > crm_verify[1482]: 2010/07/22_08:28:13 WARN: common_apply_stickiness: > Forcing WebFS away from node01.domain.org after 100 failures > (max=100) > > > node01:~# crm configure show > node node01.domain.org > node node02.domain.org > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > params ip="192.168.1.100" cidr_netmask="32" \ > op monitor interval="30s" > primitive WebFS ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/var/spool/dovecot" > fstype="ext4" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" \ > meta target-role="Started" > primitive WebSite ocf:heartbeat:apache \ > params configfile="/etc/apache2/apache2.conf" \ > op monitor interval="1min" \ > op start interval="0" timeout="40s" \ > op stop interval="0" timeout="60s" \ > meta target-role="Started" > primitive wwwdrbd ocf:linbit:drbd \ > params drbd_resource="drbd0" \ > op monitor interval="60s" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="100s" > ms WebData wwwdrbd \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" target-role="Started" > colocation WebSite-with-WebFS inf: WebSite WebFS > colocation fs_on_drbd inf: WebFS WebData:Master > colocation website-with-ip inf: WebSite ClusterIP > order WebFS-after-WebData inf: WebData:promote WebFS:start > order WebSite-after-WebFS inf: WebFS WebSite > order apache-after-ip inf: ClusterIP WebSite > property $id="cib-bootstrap-options" \ > dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > last-lrm-refresh="1279717510" > > > In logs: > Jul 22 08:18:39 node01 crmd: [1814]: ERROR: stonithd_signon: Can't > initiate connection to stonithd > Jul 22 08:18:39 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 08:18:39 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry > Jul 22 08:18:39 node01 crmd: [1814]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Jul 22 08:18:40 node01 crmd: [1814]: ERROR: stonithd_signon: Can't > initiate connection to stonithd > Jul 22 08:18:40 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 08:18:40 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry > Jul 22 08:18:40 node01 crmd: [1814]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Jul 22 08:18:41 node01 crmd: [1814]: ERROR: stonithd_signon: Can't > initiate connection to stonithd > Jul 22 08:18:41 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 08:18:41 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry > Jul 22 08:18:41 node01 crmd: [1814]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Jul 22 08:18:42 node01 cibadmin: [1199]: info: Invoked: cibadmin -Ql -o > resources > Jul 22 08:18:42 node01 cibadmin: [1200]: info: Invoked: cibadmin -p -R > -o resources > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > > Jul 22 08:18:42 nod
Re: [Pacemaker] Proposing patch for ld --as-needed
> # This patch sets the build processs 'ld --as-needed' compliant > # > # First chunck corrects the linking order so that libpe_status is linked > # after libpengine. This is needed because the linker evaluates statements > # sequentially starting from the inner most lib and libpengine uses functions > # that are defined in libpe_status. > # > # Second chunck explicitly adds CURSESLIBS dependency to libpe_status. > # This is requested from the configure.ac upon ncurses detection so we > # need to provide 'printw' or any linking with libpe_status will fail. This looks good to me. > --- pengine/Makefile.am > +++ pengine/Makefile.am > @@ -58,6 +58,7 @@ > # -L$(top_builddir)/lib/pils -lpils -export-dynamic -module -avoid-version > libpengine_la_SOURCES= pengine.c allocate.c utils.c constraints.c \ > native.c group.c clone.c master.c graph.c > +libpengine_la_LIBADD= $(top_builddir)/lib/pengine/libpe_status.la > > pengine_SOURCES = main.c > pengine_LDADD= $(COMMONLIBS) $(top_builddir)/lib/cib/libcib.la > --- lib/pengine/Makefile.am > +++ lib/pengine/Makefile.am > @@ -34,7 +34,7 @@ > > libpe_status_la_LDFLAGS = -version-info 2:0:0 > libpe_status_la_SOURCES = $(rule_files) $(status_files) > -libpe_status_la_LIBADD = -llrm > +libpe_status_la_LIBADD = -llrm @CURSESLIBS@ > > clean-generic: > rm -f *.log *.debug *~ ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Proposing patch for ld --as-needed
On 22/07/2010 01:56, Simon Horman wrote: On Wed, Jul 21, 2010 at 09:40:10PM +0200, Ultrabug wrote: On Wednesday 21 July 2010 03:49:42 Simon Horman wrote: On Tue, Jul 20, 2010 at 05:35:01PM +0200, Ultrabug wrote: - Original Message - From: Simon Horman To: The Pacemaker cluster resource manager Sent: Tue, 20 Jul 2010 05:10:32 +0200 (CEST) Subject: Re: [Pacemaker] Proposing patch for ld --as-needed On Sat, Jul 17, 2010 at 01:12:20PM +0200, Ultrabug wrote: Dear list, I would like to ask you about a possible upstream modification regarding the -- as-needed ld flag for which we Gentoo users need to patch the pacemaker sources to get it compile. I'm attaching the patch which, as you can see, is relatively small and simple (looks to me at least). The question is whether or not you think this could be done upstream ? Thank you for your interest in this and all you work, Out of interest, could you explain why this is needed? Is it because gold is being used as the linker? [ please don't top-post ] [ noticed after sending, sorry ] Thanks for the link. I guess what is happening without --as-needed is that the curses library is being dragged in somewhere, somehow - without your proposed change CURSESLIBS is used exactly nowhere. Actually the two chunks of the patch have different purposes. The first one is needed because the linking order has a meaning on a as-needed system and libpengine uses functions that are defined in libpe_status. Here is an example of failing build : http://paste.pocoo.org/show/239905/ If you try to compile a program that uses libpengine that doesn't need libpe_status, e.g. gcc -Wl,--as-needed ptest.c -o ptest -lpe_status -lpengine (shortened version of actual linking from pacemaker build.log) linking will fail. Linker evaluates that statement sequentially starting from the inner most lib: 1) do I need libpe_status? No, forget about it. 2) do I need libpengine? Yes, please. 3) is everything all right? Ups, I don't know what `was_processing_warning' is, die... The second one explicitly adds CURSESLIBS dependency to libpe_status. If pacemaker detects ncurses, you get HAVE_NCURSES_H and e.g. status_print (used in lib/pengine/native.c etc.) becomes wrapper around "printw" (see configure.ac). You need to provide `printw' or any linking with libpe_status will fail. Fail build example : http://paste.pocoo.org/show/239916/ So I think that your change is a step in the right direction, though for completeness I think that you also need to give the same treatment to libpe_rule as common.c seems to make curses calls. Could you considering updating your patch to include my proposed additions below? And could you please include a description that describes what the patch does? Perhaps something like this: Sure, if you agree with the explanations above, I'll summarize them and add them in the patch which I'll resubmit to you for integration. I think tat you have a better handle on this problem than me. So yes, please summarise your explanation above and use it as a preamble to the patch. [snip] Sure mate, here it is attached. I hope it's explained well enough. Thanks for your help and interest. Kind regards ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker # This patch sets the build processs 'ld --as-needed' compliant # # First chunck corrects the linking order so that libpe_status is linked # after libpengine. This is needed because the linker evaluates statements # sequentially starting from the inner most lib and libpengine uses functions # that are defined in libpe_status. # # Second chunck explicitly adds CURSESLIBS dependency to libpe_status. # This is requested from the configure.ac upon ncurses detection so we # need to provide 'printw' or any linking with libpe_status will fail. --- pengine/Makefile.am +++ pengine/Makefile.am @@ -58,6 +58,7 @@ # -L$(top_builddir)/lib/pils -lpils -export-dynamic -module -avoid-version libpengine_la_SOURCES = pengine.c allocate.c utils.c constraints.c \ native.c group.c clone.c master.c graph.c +libpengine_la_LIBADD= $(top_builddir)/lib/pengine/libpe_status.la pengine_SOURCES= main.c pengine_LDADD = $(COMMONLIBS) $(top_builddir)/lib/cib/libcib.la --- lib/pengine/Makefile.am +++ lib/pengine/Makefile.am @@ -34,7 +34,7 @@ libpe_status_la_LDFLAGS= -version-info 2:0:0 libpe_status_la_SOURCES= $(rule_files) $(status_files) -libpe_status_la_LIBADD = -llrm +libpe_status_la_LIBADD = -llrm @CURSESLIBS@ clean-generic: rm -f *.log *.debug *~ ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://
[Pacemaker] FS mount error
Hello all. I really new to Pacemaker and try to make some test and learn how it is all works. I use Clusters From Scratch pdf from clusterlabs.org as how-to. What we have: Debian Lenny 5.0.5 (with kernel 2.6.32-bpo.4-amd64 from backports) pacemaker 1.0.8+hg15494-4~bpo50+1 openais 1.1.2-2~bpo50+1 Problem: I try to add fs mount resource but get unknown error. If I mount it by hands - all is ok. crm_mon: Last updated: Thu Jul 22 08:22:20 2010 Stack: openais Current DC: node01.domain.org - partition with quorum Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd 2 Nodes configured, 2 expected votes 4 Resources configured. Online: [ node02.domain.org node01.domain.org ] ClusterIP (ocf::heartbeat:IPaddr2): Started node02.domain.org Master/Slave Set: WebData Masters: [ node02.domain.org ] Slaves: [ node01.domain.org ] WebFS (ocf::heartbeat:Filesystem):Started node02.domain.org FAILED Failed actions: WebFS_start_0 (node=node01.domain.org, call=18, rc=1, status=complete): unknown error WebFS_start_0 (node=node02.domain.org, call=301, rc=1, status=complete): unknown error node01:~# crm_verify -VL crm_verify[1482]: 2010/07/22_08:28:13 WARN: unpack_rsc_op: Processing failed op WebFS_start_0 on node01.domain.org: unknown error (1) crm_verify[1482]: 2010/07/22_08:28:13 WARN: unpack_rsc_op: Processing failed op WebFS_start_0 on node02.domain.org: unknown error (1) crm_verify[1482]: 2010/07/22_08:28:13 WARN: common_apply_stickiness: Forcing WebFS away from node01.domain.org after 100 failures (max=100) node01:~# crm configure show node node01.domain.org node node02.domain.org primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.1.100" cidr_netmask="32" \ op monitor interval="30s" primitive WebFS ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/var/spool/dovecot" fstype="ext4" \ op start interval="0" timeout="60s" \ op stop interval="0" timeout="60s" \ meta target-role="Started" primitive WebSite ocf:heartbeat:apache \ params configfile="/etc/apache2/apache2.conf" \ op monitor interval="1min" \ op start interval="0" timeout="40s" \ op stop interval="0" timeout="60s" \ meta target-role="Started" primitive wwwdrbd ocf:linbit:drbd \ params drbd_resource="drbd0" \ op monitor interval="60s" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="100s" ms WebData wwwdrbd \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" colocation WebSite-with-WebFS inf: WebSite WebFS colocation fs_on_drbd inf: WebFS WebData:Master colocation website-with-ip inf: WebSite ClusterIP order WebFS-after-WebData inf: WebData:promote WebFS:start order WebSite-after-WebFS inf: WebFS WebSite order apache-after-ip inf: ClusterIP WebSite property $id="cib-bootstrap-options" \ dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ last-lrm-refresh="1279717510" In logs: Jul 22 08:18:39 node01 crmd: [1814]: ERROR: stonithd_signon: Can't initiate connection to stonithd Jul 22 08:18:39 node01 crmd: [1814]: notice: Not currently connected. Jul 22 08:18:39 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in failed: triggered a retry Jul 22 08:18:39 node01 crmd: [1814]: info: te_connect_stonith: Attempting connection to fencing daemon... Jul 22 08:18:40 node01 crmd: [1814]: ERROR: stonithd_signon: Can't initiate connection to stonithd Jul 22 08:18:40 node01 crmd: [1814]: notice: Not currently connected. Jul 22 08:18:40 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in failed: triggered a retry Jul 22 08:18:40 node01 crmd: [1814]: info: te_connect_stonith: Attempting connection to fencing daemon... Jul 22 08:18:41 node01 crmd: [1814]: ERROR: stonithd_signon: Can't initiate connection to stonithd Jul 22 08:18:41 node01 crmd: [1814]: notice: Not currently connected. Jul 22 08:18:41 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in failed: triggered a retry Jul 22 08:18:41 node01 crmd: [1814]: info: te_connect_stonith: Attempting connection to fencing daemon... Jul 22 08:18:42 node01 cibadmin: [1199]: info: Invoked: cibadmin -Ql -o resources Jul 22 08:18:42 node01 cibadmin: [1200]: info: Invoked: cibadmin -p -R -o resources Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -