[Pacemaker] Master Slave

2010-07-22 Thread Freddie Sessler
I have a quick question is the Master Slave setting in pacemaker only
allowed in regards to a DRBD device? Can you use it to create other Master
Slave relationships? Does all resource agents potentially involved in this
need to be aware of the Master Slave relationship? I am trying to set up a
pair fo mysql servers One is replicating from the other(handled within
mysql's my.cnf.) I basically want to fail over the VIP of the primary node
to the secondary node(which also happens to be the mysql slave) in the event
that the primary has its mysql server stopped. I am not using DRBD at all.
My config looks like the following.

node $id="0cd2bb09-00b6-4ce4-bdd1-629767ae0739" sipl-mysql-109
node $id="119fc082-7046-4b8d-a9a3-7e777b9ddf60" sipl-mysql-209
primitive p_clusterip ocf:heartbeat:IPaddr2 \
params ip="10.200.131.9" cidr_netmask="32" \
op monitor interval="30s"
primitive p_mysql ocf:heartbeat:mysql \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120" \
op monitor interval="10" timeout="120" depth="0"
ms ms_mysql p_mysql \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
location l_master ms_mysql \
rule $id="l_master-rule" $role="Master" 100: #uname eq sipl-mysql-109
colocation mysql_master_on_ip inf: p_clusterip ms_mysql:Master
property $id="cib-bootstrap-options" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
start-failure-is-fatal="false" \
expected-quorum-votes="2" \
symmetric-cluster="false" \
dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
cluster-infrastructure="Heartbeat"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"


What's happening is that mysql is never brought up due to the following
errors:

ul 22 16:15:07 sipl-mysql-109 pengine: [22890]: info: native_color: Resource
p_mysql:0 cannot run anywhere
Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: info: native_color:
Resource p_mysql:1 cannot run anywhere
Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: info: native_merge_weights:
ms_mysql: Rolling back scores from p_clusterip
Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: info: master_color:
ms_mysql: Promoted 0 instances of a possible 1 to master
Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: info: native_color:
Resource p_clusterip cannot run anywhere
Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: info: master_color:
ms_mysql: Promoted 0 instances of a possible 1 to master
Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: notice: LogActions: Leave
resource p_clusterip (Stopped)
Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: notice: LogActions: Leave
resource p_mysql:0 (Stopped)
Jul 22 16:15:07 sipl-mysql-109 pengine: [22890]: notice: LogActions: Leave
resource p_mysql:1 (Stopped)


I thought I may have overcome this with my location and colocation directive
but it failed. Could someone give me some feedback on what I am trying to
do, my config and the resulting errors?

Thanks
F.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] mysql RA constantly restarting db

2010-07-22 Thread Freddie Sessler
Hello,
I am new to pacemaker and struggling with the somewhat limited
documentation. I looked through the archives and didn't find anything that
matched my problem. I have brand new pacemaker setup running on CentOS 5.5.
I am using the below config file to start up mysql which is also a brand new
build. Right now the cluster is only running on one node while I try to
isolate this problem. This is a brand new cib file as well. The cluster
starts up but then every 30 seconds or so I see it restart mysql.  If I stop
heartbeat and bring up mysql by itself it starts up just fine. Its driving
me batty so I thought I would post it here and see if someone was able to
help. What I see in syslog from heartbeat is:

ul 22 15:34:09 sipl-mysql-109 lrmd: [11182]: info: rsc:d_mysql:69: start
Jul 22 15:34:11 sipl-mysql-109 lrmd: [11182]: info: RA output:
(ip_db:start:stderr) ARPING 10.200.131.9 from 10.200.131.9 eth0 Sent 5
probes (5 broadcast(s)) Received 0 response(s)
Jul 22 15:34:13 sipl-mysql-109 mysql[14915]: [15086]: INFO: MySQL started
Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: process_lrm_event: LRM
operation d_mysql_start_0 (call=69, rc=0, cib-update=105, confirmed=true) ok
Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: match_graph_event:
Action d_mysql_start_0 (6) confirmed on sipl-mysql-109 (rc=0)
Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: te_rsc_command:
Initiating action 1: monitor d_mysql_monitor_1 on sipl-mysql-109 (local)
Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: do_lrm_rsc_op:
Performing key=1:14:0:989206b7-461a-42db-a2a7-7b447bd6c5b3
op=d_mysql_monitor_1 )
Jul 22 15:34:13 sipl-mysql-109 lrmd: [11182]: info: rsc:d_mysql:70: monitor
Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: te_rsc_command:
Initiating action 8: start ip_db_start_0 on sipl-mysql-109 (local)
Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: do_lrm_rsc_op:
Performing key=8:14:0:989206b7-461a-42db-a2a7-7b447bd6c5b3 op=ip_db_start_0
)
Jul 22 15:34:13 sipl-mysql-109 lrmd: [11182]: info: rsc:ip_db:71: start
Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: info: process_lrm_event: LRM
operation d_mysql_monitor_1 (call=70, rc=7, cib-update=106,
confirmed=false) not running
Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: WARN: status_from_rc: Action 1
(d_mysql_monitor_1) on sipl-mysql-109 failed (target: 0 vs. rc: 7):
Error
Jul 22 15:34:13 sipl-mysql-109 crmd: [11185]: WARN: update_failcount:
Updating failcount for d_mysql on sipl-mysql-109 after failed monitor.


The output of crm configure show is:

primitive d_mysql ocf:heartbeat:mysql \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120" \
op monitor interval="10" timeout="30" depth="0" param
binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" datadir="/var/lib/mysql"
user="mysql" pid="/var/run/mysqld/mysql.pid"
socket="/var/lib/mysql/mysql.sock"
primitive ip_db ocf:heartbeat:IPaddr2 \
params ip="10.200.131.9" cidr_netmask="32" \
op monitor interval="30s" nic="eth0"
group sv_db d_mysql ip_db
property $id="cib-bootstrap-options" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
start-failure-is-fatal="false" \
expected-quorum-votes="2" \
dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
cluster-infrastructure="Heartbeat"
rsc_defaults $id="rsc_defaults-options" \
migration-threshold="20" \
failure-timeout="20"

My versions are as follows:

[r...@sipl-mysql-109 rc0.d]# rpm -qa | egrep "coro|pacemaker|heart"
corosynclib-1.2.5-1.3.el5
corosync-1.2.5-1.3.el5
corosync-1.2.5-1.3.el5
heartbeat-3.0.3-2.3.el5
pacemaker-1.0.9.1-1.11.el5
pacemaker-1.0.9.1-1.11.el5
corosynclib-1.2.5-1.3.el5
heartbeat-libs-3.0.3-2.3.el5
heartbeat-3.0.3-2.3.el5
pacemaker-libs-1.0.9.1-1.11.el5
heartbeat-libs-3.0.3-2.3.el5
pacemaker-libs-1.0.9.1-1.11.el5

rpm -qa | grep resource
resource-agents-1.0.3-2.6.el5

[r...@sipl-mysql-109 rc0.d]# cat /etc/redhat-release
CentOS release 5.5 (Final)

[r...@sipl-mysql-109 rc0.d]# uname -r
2.6.18-194.8.1.el5

[r...@sipl-mysql-109 rc0.d]# mysql -V
mysql  Ver 14.14 Distrib 5.1.48, for unknown-linux-gnu (x86_64) using
readline 5.1

My ha.cf looks like:

autojoin none
mcast eth0 227.0.0.10 694 1 0
warntime 5
deadtime 15
initdead 60
keepalive 5
auto_failback off
node sipl-mysql-109
node sipl-mysql-209
crm on


Mysql show the following in it's error log:

100722 15:33:57 [Note] Plugin 'FEDERATED' is disabled.
100722 15:33:57  InnoDB: Started; log sequence number 0 44233
100722 15:33:57 [Note] Event Scheduler: Loaded 0 events
100722 15:33:57 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.48-community-log'  socket: '/var/lib/mysql/mysql.sock'  port:
3306  MySQL Community Server (GPL)
100722 15:34:01 [Note] /usr/sbin/mysqld: Normal shutdown

100722 15:34:01 [Note] Event Scheduler: Purging the queue. 0 events
100722 15:34:01  InnoDB: Starting shutdown...
100722 15:34:02  InnoDB: Shutdown completed; log sequence number 0 44233
100722 15:34:02 [Note] /usr/sbin/mysqld: Shutdown complete

100722 15:34

Re: [Pacemaker] FS mount error

2010-07-22 Thread Andrew Beekhof
On Thu, Jul 22, 2010 at 10:36 AM, Proskurin Kirill
 wrote:
> On 22/07/10 12:23, Michael Fung wrote:
>>
>> crm resource cleanup WebFS
>
> That not help.
>
> node01:~# crm resource cleanup WebFS
> Cleaning up WebFS on mail02.fxclub.org
> Cleaning up WebFS on mail01.fxclub.org
>
> Jul 22 09:33:24 node01 crm_resource: [3442]: info: Invoked: crm_resource -C
> -r WebFS -H node01.domain.org
> Jul 22 09:33:25 node01 crmd: [1814]: ERROR: stonithd_signon: Can't initiate
> connection to stonithd
> Jul 22 09:33:25 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 09:33:25 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry

Looks like a bad install

> Jul 22 09:33:25 node01 crmd: [1814]: info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Jul 22 09:33:25 node01 crmd: [1814]: info: unpack_graph: Unpacked transition
> 647: 6 actions in 6 synapses
> Jul 22 09:33:25 node01 crmd: [1814]: info: do_te_invoke: Processing graph
> 647 (ref=pe_calc-dc-1279787604-2520) derived from
> /var/lib/pengine/pe-input-691.bz2
> Jul 22 09:33:25 node01 crmd: [1814]: info: te_rsc_command: Initiating action
> 2: stop WebFS_stop_0 on node02.domain.org
> Jul 22 09:33:25 node01 crmd: [1814]: info: te_rsc_command: Initiating action
> 6: probe_complete probe_complete on node02.domain.org - no waiting
>
> ...
>
> Jul 22 09:33:32 node01 crmd: [1814]: WARN: status_from_rc: Action 43
> (WebFS_start_0) on node02.domain.org failed (target: 0 vs. rc: 1): Error
> Jul 22 09:33:32 node01 crmd: [1814]: WARN: update_failcount: Updating
> failcount for WebFS on node02.domain.org after failed start: rc=1
> (update=INFINITY, time=1279787612)
> Jul 22 09:33:32 node01 crmd: [1814]: info: abort_transition_graph:
> match_graph_event:272 - Triggered transition abort (complete=0,
> tag=lrm_rsc_op, id=WebFS_start_0,
> magic=0:1;43:647:0:882b3ca6-0496-4e26-9137-0a10d6ce57e4, cib=0.144.897) :
> Event failed
> Jul 22 09:33:32 node01 crmd: [1814]: info: update_abort_priority: Abort
> priority upgraded from 0 to 1
> Jul 22 09:33:32 node01 crmd: [1814]: info: update_abort_priority: Abort
> action done superceeded by restart
> Jul 22 09:33:32 node01 crmd: [1814]: info: match_graph_event: Action
> WebFS_start_0 (43) confirmed on node02.domain.org (rc=4)
> Jul 22 09:33:32 node01 crmd: [1814]: info: run_graph:
> 
> Jul 22 09:33:32 node01 crmd: [1814]: notice: run_graph: Transition 647
> (Complete=4, Pending=0, Fired=0, Skipped=2, Incomplete=0,
> Source=/var/lib/pengine/pe-input-691.bz2): Stopped
> Jul 22 09:33:32 node01 crmd: [1814]: info: te_graph_trigger: Transition 647
> is now complete
>
>
> --
> Best regards,
> Proskurin Kirill
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] FS mount error

2010-07-22 Thread Proskurin Kirill

On 22/07/10 12:23, Michael Fung wrote:

crm resource cleanup WebFS


That not help.

node01:~# crm resource cleanup WebFS
Cleaning up WebFS on mail02.fxclub.org
Cleaning up WebFS on mail01.fxclub.org

Jul 22 09:33:24 node01 crm_resource: [3442]: info: Invoked: crm_resource 
-C -r WebFS -H node01.domain.org
Jul 22 09:33:25 node01 crmd: [1814]: ERROR: stonithd_signon: Can't 
initiate connection to stonithd

Jul 22 09:33:25 node01 crmd: [1814]: notice: Not currently connected.
Jul 22 09:33:25 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in 
failed: triggered a retry
Jul 22 09:33:25 node01 crmd: [1814]: info: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Jul 22 09:33:25 node01 crmd: [1814]: info: unpack_graph: Unpacked 
transition 647: 6 actions in 6 synapses
Jul 22 09:33:25 node01 crmd: [1814]: info: do_te_invoke: Processing 
graph 647 (ref=pe_calc-dc-1279787604-2520) derived from 
/var/lib/pengine/pe-input-691.bz2
Jul 22 09:33:25 node01 crmd: [1814]: info: te_rsc_command: Initiating 
action 2: stop WebFS_stop_0 on node02.domain.org
Jul 22 09:33:25 node01 crmd: [1814]: info: te_rsc_command: Initiating 
action 6: probe_complete probe_complete on node02.domain.org - no waiting


...

Jul 22 09:33:32 node01 crmd: [1814]: WARN: status_from_rc: Action 43 
(WebFS_start_0) on node02.domain.org failed (target: 0 vs. rc: 1): Error
Jul 22 09:33:32 node01 crmd: [1814]: WARN: update_failcount: Updating 
failcount for WebFS on node02.domain.org after failed start: rc=1 
(update=INFINITY, time=1279787612)
Jul 22 09:33:32 node01 crmd: [1814]: info: abort_transition_graph: 
match_graph_event:272 - Triggered transition abort (complete=0, 
tag=lrm_rsc_op, id=WebFS_start_0, 
magic=0:1;43:647:0:882b3ca6-0496-4e26-9137-0a10d6ce57e4, cib=0.144.897) 
: Event failed
Jul 22 09:33:32 node01 crmd: [1814]: info: update_abort_priority: Abort 
priority upgraded from 0 to 1
Jul 22 09:33:32 node01 crmd: [1814]: info: update_abort_priority: Abort 
action done superceeded by restart
Jul 22 09:33:32 node01 crmd: [1814]: info: match_graph_event: Action 
WebFS_start_0 (43) confirmed on node02.domain.org (rc=4)
Jul 22 09:33:32 node01 crmd: [1814]: info: run_graph: 

Jul 22 09:33:32 node01 crmd: [1814]: notice: run_graph: Transition 647 
(Complete=4, Pending=0, Fired=0, Skipped=2, Incomplete=0, 
Source=/var/lib/pengine/pe-input-691.bz2): Stopped
Jul 22 09:33:32 node01 crmd: [1814]: info: te_graph_trigger: Transition 
647 is now complete



--
Best regards,
Proskurin Kirill

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] FS mount error

2010-07-22 Thread Michael Fung
Please try:

# crm resource cleanup WebFS

This will fix if resource's fail-count reached INFINITY.


Rgds,
Michael


On 2010/7/22 下午 03:29, Proskurin Kirill wrote:
> Hello all.
> 
> I really new to Pacemaker and try to make some test and learn how it is
> all works. I use Clusters From Scratch pdf from clusterlabs.org as how-to.
> 
> What we have:
> Debian Lenny 5.0.5 (with kernel 2.6.32-bpo.4-amd64 from backports)
> pacemaker 1.0.8+hg15494-4~bpo50+1
> openais 1.1.2-2~bpo50+1
> 
> 
> Problem:
> I try to add fs mount resource but get unknown error. If I mount it by
> hands - all is ok.
> 
> crm_mon:
> 
> 
> Last updated: Thu Jul 22 08:22:20 2010
> Stack: openais
> Current DC: node01.domain.org - partition with quorum
> Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> 
> 
> Online: [ node02.domain.org node01.domain.org ]
> 
> ClusterIP   (ocf::heartbeat:IPaddr2):   Started node02.domain.org
>  Master/Slave Set: WebData
>  Masters: [ node02.domain.org ]
>  Slaves: [ node01.domain.org ]
> WebFS   (ocf::heartbeat:Filesystem):Started node02.domain.org FAILED
> 
> Failed actions:
> WebFS_start_0 (node=node01.domain.org, call=18, rc=1,
> status=complete): unknown error
> WebFS_start_0 (node=node02.domain.org, call=301, rc=1,
> status=complete): unknown error
> 
> node01:~# crm_verify -VL
> crm_verify[1482]: 2010/07/22_08:28:13 WARN: unpack_rsc_op: Processing
> failed op WebFS_start_0 on node01.domain.org: unknown error (1)
> crm_verify[1482]: 2010/07/22_08:28:13 WARN: unpack_rsc_op: Processing
> failed op WebFS_start_0 on node02.domain.org: unknown error (1)
> crm_verify[1482]: 2010/07/22_08:28:13 WARN: common_apply_stickiness:
> Forcing WebFS away from node01.domain.org after 100 failures
> (max=100)
> 
> 
> node01:~# crm configure show
> node node01.domain.org
> node node02.domain.org
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
> params ip="192.168.1.100" cidr_netmask="32" \
> op monitor interval="30s"
> primitive WebFS ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/var/spool/dovecot"
> fstype="ext4" \
> op start interval="0" timeout="60s" \
> op stop interval="0" timeout="60s" \
> meta target-role="Started"
> primitive WebSite ocf:heartbeat:apache \
> params configfile="/etc/apache2/apache2.conf" \
> op monitor interval="1min" \
> op start interval="0" timeout="40s" \
> op stop interval="0" timeout="60s" \
> meta target-role="Started"
> primitive wwwdrbd ocf:linbit:drbd \
> params drbd_resource="drbd0" \
> op monitor interval="60s" \
> op start interval="0" timeout="240s" \
> op stop interval="0" timeout="100s"
> ms WebData wwwdrbd \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Started"
> colocation WebSite-with-WebFS inf: WebSite WebFS
> colocation fs_on_drbd inf: WebFS WebData:Master
> colocation website-with-ip inf: WebSite ClusterIP
> order WebFS-after-WebData inf: WebData:promote WebFS:start
> order WebSite-after-WebFS inf: WebFS WebSite
> order apache-after-ip inf: ClusterIP WebSite
> property $id="cib-bootstrap-options" \
> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> last-lrm-refresh="1279717510"
> 
> 
> In logs:
> Jul 22 08:18:39 node01 crmd: [1814]: ERROR: stonithd_signon: Can't
> initiate connection to stonithd
> Jul 22 08:18:39 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 08:18:39 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry
> Jul 22 08:18:39 node01 crmd: [1814]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
> Jul 22 08:18:40 node01 crmd: [1814]: ERROR: stonithd_signon: Can't
> initiate connection to stonithd
> Jul 22 08:18:40 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 08:18:40 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry
> Jul 22 08:18:40 node01 crmd: [1814]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
> Jul 22 08:18:41 node01 crmd: [1814]: ERROR: stonithd_signon: Can't
> initiate connection to stonithd
> Jul 22 08:18:41 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 08:18:41 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry
> Jul 22 08:18:41 node01 crmd: [1814]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
> Jul 22 08:18:42 node01 cibadmin: [1199]: info: Invoked: cibadmin -Ql -o
> resources
> Jul 22 08:18:42 node01 cibadmin: [1200]: info: Invoked: cibadmin -p -R
> -o resources
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
> 
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
>   
> Jul 22 08:18:42 nod

Re: [Pacemaker] Proposing patch for ld --as-needed

2010-07-22 Thread Simon Horman
> # This patch sets the build processs 'ld --as-needed' compliant
> #
> # First chunck corrects the linking order so that libpe_status is linked
> # after libpengine. This is needed because the linker evaluates statements
> # sequentially starting from the inner most lib and libpengine uses functions
> # that are defined in libpe_status.
> #
> # Second chunck explicitly adds CURSESLIBS dependency to libpe_status.
> # This is requested from the configure.ac upon ncurses detection so we
> # need to provide 'printw' or any linking with libpe_status will fail.

This looks good to me.

> --- pengine/Makefile.am
> +++ pengine/Makefile.am
> @@ -58,6 +58,7 @@
>  # -L$(top_builddir)/lib/pils -lpils -export-dynamic -module -avoid-version
>  libpengine_la_SOURCES= pengine.c allocate.c utils.c constraints.c \
>   native.c group.c clone.c master.c graph.c
> +libpengine_la_LIBADD= $(top_builddir)/lib/pengine/libpe_status.la
> 
>  pengine_SOURCES  = main.c
>  pengine_LDADD= $(COMMONLIBS) $(top_builddir)/lib/cib/libcib.la
> --- lib/pengine/Makefile.am
> +++ lib/pengine/Makefile.am
> @@ -34,7 +34,7 @@
> 
>  libpe_status_la_LDFLAGS  = -version-info 2:0:0
>  libpe_status_la_SOURCES  =  $(rule_files) $(status_files)
> -libpe_status_la_LIBADD   = -llrm
> +libpe_status_la_LIBADD   = -llrm @CURSESLIBS@
> 
>  clean-generic:
>   rm -f *.log *.debug *~



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Proposing patch for ld --as-needed

2010-07-22 Thread Ultrabug

On 22/07/2010 01:56, Simon Horman wrote:

On Wed, Jul 21, 2010 at 09:40:10PM +0200, Ultrabug wrote:

On Wednesday 21 July 2010 03:49:42 Simon Horman wrote:

On Tue, Jul 20, 2010 at 05:35:01PM +0200, Ultrabug wrote:

- Original Message -
From: Simon Horman
To: The Pacemaker cluster resource manager
  Sent: Tue, 20 Jul 2010 05:10:32 +0200
(CEST)
Subject: Re: [Pacemaker] Proposing patch for ld --as-needed

On Sat, Jul 17, 2010 at 01:12:20PM +0200, Ultrabug wrote:

Dear list,

I would like to ask you about a possible upstream modification regarding
the -- as-needed ld flag for which we Gentoo users need to patch the
pacemaker sources to get it compile.

I'm attaching the patch which, as you can see, is relatively small and
simple (looks to me at least). The question is whether or not you think
this could be done upstream ?

Thank you for your interest in this and all you work,


Out of interest, could you explain why this is needed?
Is it because gold is being used as the linker?


[ please don't top-post ]


[ noticed after sending, sorry ]



Thanks for the link.

I guess what is happening without --as-needed is that the curses
library is being dragged in somewhere, somehow - without your proposed
change CURSESLIBS is used exactly nowhere.



Actually the two chunks of the patch have different purposes.

The first one is needed because the linking order has a meaning on a as-needed
system and libpengine uses functions that are defined in libpe_status.

Here is an example of failing build : http://paste.pocoo.org/show/239905/
If you try to compile a program that uses libpengine that doesn't need
libpe_status, e.g. gcc -Wl,--as-needed ptest.c -o ptest -lpe_status -lpengine
(shortened version of actual linking from pacemaker build.log) linking will
fail. Linker evaluates that statement sequentially starting from the inner
most lib:
   1) do I need libpe_status? No, forget about it.
   2) do I need libpengine? Yes, please.
   3) is everything all right? Ups, I don't know what `was_processing_warning'
is, die...


The second one explicitly adds CURSESLIBS dependency to libpe_status.
If pacemaker detects ncurses, you get HAVE_NCURSES_H and e.g.
status_print (used in  lib/pengine/native.c etc.) becomes wrapper
around "printw" (see configure.ac). You need to provide `printw' or any linking
with libpe_status will fail.
Fail build example : http://paste.pocoo.org/show/239916/


So I think that your change is a step in the right direction,
though for completeness I think that you also need to give the same
treatment to libpe_rule as common.c seems to make curses calls.

Could you considering updating your patch to include my proposed
additions below? And could you please include a description that describes
what the patch does? Perhaps something like this:



Sure, if you agree with the explanations above, I'll summarize them and add
them in the patch which I'll resubmit to you for integration.


I think tat you have a better handle on this problem than me.
So yes, please summarise your explanation above and use
it as a preamble to the patch.

[snip]


Sure mate, here it is attached. I hope it's explained well enough.
Thanks for your help and interest.

Kind regards




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


# This patch sets the build processs 'ld --as-needed' compliant
#
# First chunck corrects the linking order so that libpe_status is linked
# after libpengine. This is needed because the linker evaluates statements
# sequentially starting from the inner most lib and libpengine uses functions
# that are defined in libpe_status.
#
# Second chunck explicitly adds CURSESLIBS dependency to libpe_status.
# This is requested from the configure.ac upon ncurses detection so we
# need to provide 'printw' or any linking with libpe_status will fail.
--- pengine/Makefile.am
+++ pengine/Makefile.am
@@ -58,6 +58,7 @@
 # -L$(top_builddir)/lib/pils -lpils -export-dynamic -module -avoid-version
 libpengine_la_SOURCES  = pengine.c allocate.c utils.c constraints.c \
native.c group.c clone.c master.c graph.c
+libpengine_la_LIBADD= $(top_builddir)/lib/pengine/libpe_status.la

 pengine_SOURCES= main.c
 pengine_LDADD  = $(COMMONLIBS) $(top_builddir)/lib/cib/libcib.la
--- lib/pengine/Makefile.am
+++ lib/pengine/Makefile.am
@@ -34,7 +34,7 @@

 libpe_status_la_LDFLAGS= -version-info 2:0:0
 libpe_status_la_SOURCES=  $(rule_files) $(status_files)
-libpe_status_la_LIBADD = -llrm
+libpe_status_la_LIBADD = -llrm @CURSESLIBS@

 clean-generic:
rm -f *.log *.debug *~
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://

[Pacemaker] FS mount error

2010-07-22 Thread Proskurin Kirill

Hello all.

I really new to Pacemaker and try to make some test and learn how it is 
all works. I use Clusters From Scratch pdf from clusterlabs.org as how-to.


What we have:
Debian Lenny 5.0.5 (with kernel 2.6.32-bpo.4-amd64 from backports)
pacemaker 1.0.8+hg15494-4~bpo50+1
openais 1.1.2-2~bpo50+1


Problem:
I try to add fs mount resource but get unknown error. If I mount it by 
hands - all is ok.


crm_mon:


Last updated: Thu Jul 22 08:22:20 2010
Stack: openais
Current DC: node01.domain.org - partition with quorum
Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
2 Nodes configured, 2 expected votes
4 Resources configured.


Online: [ node02.domain.org node01.domain.org ]

ClusterIP   (ocf::heartbeat:IPaddr2):   Started node02.domain.org
 Master/Slave Set: WebData
 Masters: [ node02.domain.org ]
 Slaves: [ node01.domain.org ]
WebFS   (ocf::heartbeat:Filesystem):Started node02.domain.org FAILED

Failed actions:
WebFS_start_0 (node=node01.domain.org, call=18, rc=1, 
status=complete): unknown error
WebFS_start_0 (node=node02.domain.org, call=301, rc=1, 
status=complete): unknown error


node01:~# crm_verify -VL
crm_verify[1482]: 2010/07/22_08:28:13 WARN: unpack_rsc_op: Processing 
failed op WebFS_start_0 on node01.domain.org: unknown error (1)
crm_verify[1482]: 2010/07/22_08:28:13 WARN: unpack_rsc_op: Processing 
failed op WebFS_start_0 on node02.domain.org: unknown error (1)
crm_verify[1482]: 2010/07/22_08:28:13 WARN: common_apply_stickiness: 
Forcing WebFS away from node01.domain.org after 100 failures 
(max=100)



node01:~# crm configure show
node node01.domain.org
node node02.domain.org
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.1.100" cidr_netmask="32" \
op monitor interval="30s"
primitive WebFS ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/var/spool/dovecot" fstype="ext4" 
\
op start interval="0" timeout="60s" \
op stop interval="0" timeout="60s" \
meta target-role="Started"
primitive WebSite ocf:heartbeat:apache \
params configfile="/etc/apache2/apache2.conf" \
op monitor interval="1min" \
op start interval="0" timeout="40s" \
op stop interval="0" timeout="60s" \
meta target-role="Started"
primitive wwwdrbd ocf:linbit:drbd \
params drbd_resource="drbd0" \
op monitor interval="60s" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="100s"
ms WebData wwwdrbd \
	meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true" target-role="Started"

colocation WebSite-with-WebFS inf: WebSite WebFS
colocation fs_on_drbd inf: WebFS WebData:Master
colocation website-with-ip inf: WebSite ClusterIP
order WebFS-after-WebData inf: WebData:promote WebFS:start
order WebSite-after-WebFS inf: WebFS WebSite
order apache-after-ip inf: ClusterIP WebSite
property $id="cib-bootstrap-options" \
dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
last-lrm-refresh="1279717510"


In logs:
Jul 22 08:18:39 node01 crmd: [1814]: ERROR: stonithd_signon: Can't 
initiate connection to stonithd

Jul 22 08:18:39 node01 crmd: [1814]: notice: Not currently connected.
Jul 22 08:18:39 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in 
failed: triggered a retry
Jul 22 08:18:39 node01 crmd: [1814]: info: te_connect_stonith: 
Attempting connection to fencing daemon...
Jul 22 08:18:40 node01 crmd: [1814]: ERROR: stonithd_signon: Can't 
initiate connection to stonithd

Jul 22 08:18:40 node01 crmd: [1814]: notice: Not currently connected.
Jul 22 08:18:40 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in 
failed: triggered a retry
Jul 22 08:18:40 node01 crmd: [1814]: info: te_connect_stonith: 
Attempting connection to fencing daemon...
Jul 22 08:18:41 node01 crmd: [1814]: ERROR: stonithd_signon: Can't 
initiate connection to stonithd

Jul 22 08:18:41 node01 crmd: [1814]: notice: Not currently connected.
Jul 22 08:18:41 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in 
failed: triggered a retry
Jul 22 08:18:41 node01 crmd: [1814]: info: te_connect_stonith: 
Attempting connection to fencing daemon...
Jul 22 08:18:42 node01 cibadmin: [1199]: info: Invoked: cibadmin -Ql -o 
resources
Jul 22 08:18:42 node01 cibadmin: [1200]: info: Invoked: cibadmin -p -R 
-o resources
Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - 

Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - 
  
Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - 

Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - 
  
Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - 

Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -