Re: [Pacemaker] want resource A to start successfully (completely) then start B

2010-06-01 Thread Andrew Beekhof
On Mon, May 31, 2010 at 5:40 PM, Aaditya kumar
passion.for.syst...@gmail.com wrote:

 Hi all,
     AFAIK in resource ordering order(A,B)
 start A is invoked before start B.

 BUT my situation demands that
 NOT ONLY start A is invoked before start B
 BUT ALSO start A complete successfully (return rc=0) then and only then lrmd
 should invoke start B.

 In my logs the resources are started as they should be BUT second resource
 dosent WAIT FOR first resource's start operation to return .

Seriously, enough of the ALL CAPS.


 I am using mysql on nfs mount .
 nfs mount is invoked before mysql BUT mysql dosent wait for nfs mount to
 finish and mysql dosent start (rc = 6 , resource not configured, coz the
 data dir is nfs mount which
 is not mounted yet)

Please include your configuration.


 IS THERE A WAY to WAIT for first successful start before starting second
 resource ?




 May 31 20:31:16 localhost crmd: [14581]: info: te_pseudo_action: Pseudo
 action 3 fired and confirmed
 May 31 20:31:16 localhost crmd: [14581]: info: te_rsc_command: Initiating
 action 9: start failover-ip_start_0 on aadityaxcat2 (local)
 May 31 20:31:16 localhost crmd: [14581]: info: do_lrm_rsc_op: Performing
 key=9:5:0:800549c5-d049-45bf-9987-68423a7a95c4 op=failover-ip_start_0 )
 May 31 20:31:16 localhost lrmd: [14578]: info: rsc:failover-ip:26: start
 May 31 20:31:16 localhost crmd: [14581]: info: te_rsc_command: Initiating
 action 11: start xcat_ha_start_0 on aadityaxcat2 (local)
 May 31 20:31:16 localhost crmd: [14581]: info: do_lrm_rsc_op: Performing
 key=11:5:0:800549c5-d049-45bf-9987-68423a7a95c4 op=xcat_ha_start_0 )
 May 31 20:31:16 localhost lrmd: [14578]: info: rsc:xcat_ha:27: start
 May 31 20:31:16 localhost crmd: [14581]: info: te_rsc_command: Initiating
 action 13: start nfs_ha_start_0 on aadityaxcat2 (local)
 May 31 20:31:16 localhost crmd: [14581]: info: do_lrm_rsc_op: Performing
 key=13:5:0:800549c5-d049-45bf-9987-68423a7a95c4 op=nfs_ha_start_0 )
 May 31 20:31:16 localhost lrmd: [14578]: info: rsc:nfs_ha:28: start
 May 31 20:31:16 localhost crmd: [14581]: info: te_rsc_command: Initiating
 action 17: start mysql_start_0 on aadityaxcat2 (local)
 May 31 20:31:16 localhost crmd: [14581]: info: do_lrm_rsc_op: Performing
 key=17:5:0:800549c5-d049-45bf-9987-68423a7a95c4 op=mysql_start_0 )
 May 31 20:31:16 localhost lrmd: [14578]: info: rsc:mysql:29: start
 May 31 20:31:16 localhost crmd: [14581]: info: te_rsc_command: Initiating
 action 19: start apache_start_0 on aadityaxcat2 (local)
 May 31 20:31:16 localhost mysql[16473]: ERROR: Datadir /mnt/nfs/mysql
 doesn't exist
 May 31 20:31:16 localhost crmd: [14581]: info: do_lrm_rsc_op: Performing
 key=19:5:0:800549c5-d049-45bf-9987-68423a7a95c4 op=apache_start_0 )
 May 31 20:31:16 localhost pengine: [15590]: WARN: process_pe_message:
 Transition 5: WARNINGs found during PE processing. PEngine Input stored in:
 /var/lib/pengine/pe-warn-96712.bz2
 May 31 20:31:16 localhost pengine: [15590]: info: process_pe_message:
 Configuration WARNINGs found during PE processing.  Please run crm_verify
 -L to identify issues.
 May 31 20:31:16 localhost IPaddr[16468]: INFO: Using calculated netmask for
 10.0.0.5: 255.0.0.0
 May 31 20:31:16 localhost Filesystem[16470]: INFO: Running start for
 unicluster:/nfs on /mnt/nfs
 May 31 20:31:16 localhost lrmd: [14578]: info: RA output:
 (xcat_ha:start:stdout) Xcatd starting ...
 May 31 20:31:16 localhost crmd: [14581]: info: process_lrm_event: LRM
 operation mysql_start_0 (call=29, rc=6, cib-update=93, confirmed=true) not
 configured
 May 31 20:31:16 localhost crmd: [14581]: WARN: status_from_rc: Action 17
 (mysql_start_0) on aadityaxcat2 failed (target: 0 vs. rc: 6): Error
 May 31 20:31:16 localhost IPaddr[16468]: INFO: eval ifconfig eth0:0 10.0.0.5
 netmask 255.0.0.0 broadcast 10.255.255.255
 May 31 20:31:16 localhost crmd: [14581]: WARN: update_failcount: Updating
 failcount for mysql on aadityaxcat2 after failed start: rc=6
 (update=value++, time=1275318076)
 May 31 20:31:16 localhost crmd: [14581]: info: abort_transition_graph:
 match_graph_event:272 - Triggered transition abort (complete=0,
 tag=lrm_rsc_op, id=mysql_start_0,
 magic=0:6;17:5:0:800549c5-d049-45bf-9987-68423a7a95c4, cib=12.187.43) :
 Event failed
 May 31 20:31:16 localhost crmd: [14581]: info: update_abort_priority: Abort
 priority upgraded from 0 to 1
 May 31 20:31:16 localhost crmd: [14581]: info: update_abort_priority: Abort
 action done superceeded by restart
 May 31 20:31:16 localhost crmd: [14581]: info: match_graph_event: Action
 mysql_start_0 (17) confirmed on aadityaxcat2 (rc=4)
 May 31 20:31:16 localhost attrd: [14580]: info: find_hash_entry: Creating
 hash entry for fail-count-mysql
 May 31 20:31:16 localhost attrd: [14580]: info: attrd_local_callback:
 Expanded fail-count-mysql=value++ to 1


 --
 Regards ,
 Aaditya.

 I want to change the world ,But God won't give me the source code.

 ___
 Pacemaker mailing list: 

Re: [Pacemaker] mounting gfs2

2010-06-01 Thread Andrew Beekhof
On Mon, May 31, 2010 at 9:45 AM, marc genou marcge...@gmail.com wrote:
 Hi
 I am trying to deploy an Active/active cluster but I found some troubles.
 When I try to mount a gfs2 filesystem in top of drbd I got this error:
 gfs_controld join connect error: Connection refused
 error mounting lockproto lock_dlm
 I am using experimental packages gfs-pcmk/dlm-pcmk 3.0.11 in Debian Squeeze.
 Any ideas?

Are you running heartbeat or corosync (with openais)?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] mounting gfs2

2010-06-01 Thread Oliver Heinz
Am Montag, 31. Mai 2010, um 09:45:26 schrieb marc genou:
 Hi
 
 I am trying to deploy an Active/active cluster but I found some troubles.
 When I try to mount a gfs2 filesystem in top of drbd I got this error:
 
 gfs_controld join connect error: Connection refused
 error mounting lockproto lock_dlm
 
 I am using experimental packages gfs-pcmk/dlm-pcmk 3.0.11 in Debian
 Squeeze. Any ideas?

Is your dlm_controld.pcmk really getting started? Under certain conditions it 
tends to segfault when it is started early in the boot process. A patch for 
this is available and included in redhat-cluster 3.0.12.

You might read the thread [Pacemaker] startup problem DLM on ubuntu lucid  
in the archive.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] mounting gfs2

2010-06-01 Thread marc genou
I'm using heartbeat. ¿Should I try with corosync?
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] handle EINTR in sem_wait (pacemaker corosync 1.2.2+ crash)

2010-06-01 Thread Steven Dake

Hello,

I have found the cause of the crash that was occurring only on some 
deployments.  The cause is that sem_wait is interrupted by signal, and 
the wait operation is not retried (as is customary in posix).


Patch attached to fix

A big thank you to Vladislav Bogdanov for running the test case and 
verifying it fixes the problem.



Regards
-steve
Index: logsys.c
===
--- logsys.c(revision 2915)
+++ logsys.c(working copy)
@@ -661,7 +661,18 @@
sem_post (logsys_thread_start);
for (;;) {
dropped = 0;
-   sem_wait (logsys_print_finished);
+retry_sem_wait:
+   res = sem_wait (logsys_print_finished);
+   if (res == -1  errno == EINTR) {
+   goto retry_sem_wait;
+   } else
+   if (res == -1) {
+   /*
+ *  * This case shouldn't happen
+ *  */
+   pthread_exit (NULL);
+   }
+   
 
logsys_wthread_lock();
if (wthread_should_exit) {
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Both nodes become master

2010-06-01 Thread Jorge Santos Fonseca
Hi all!

I'm installing a system with heartbeat 3.0.3 and Pacemaker 1.0.8 with 
the configuration showed at the end of the message. Network is bonded (NIC 
teaming)

If I unplug nic cables from the master server the other become inactive 
(slave) that it's ok. 
Now, I plug the machine again to the network and some times both nodes 
become Active/active and only if I restart heartbeat synchronizes again.

Thanks in advance

Jorge


NODE1:


Last updated: Tue Jun  1 18:23:02 2010
Stack: Heartbeat
Current DC: sipserver1 (336b9a65-615b-4ae0-9c54-de46fafc478a) - partition 
with quorum
Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
2 Nodes configured, 2 expected votes
5 Resources configured.


Online: [ sipserver1 ]
OFFLINE: [ sipserver2 ]

 Clone Set: clonePing
 Started: [ sipserver1 ]
 Stopped: [ resPing:1 ]
 asterisk   (ocf::heartbeat:asterisk):  Started sipserver1
 virtual_IPaddr (ocf::heartbeat:IPaddr2):   Started sipserver1
 Clone Set: cloneOpenser
 Started: [ sipserver1 ]
 Stopped: [ openser:1 ]
 Clone Set: cloneMysql
 Started: [ sipserver1 ]
 Stopped: [ mysql:1 ]

NODE2:

crm_mon -1 shows :


Last updated: Tue Jun  1 18:19:40 2010
Stack: Heartbeat
Current DC: sipserver2 (b694d28c-41e6-4fdd-bfc1-ff097b5a9349) - partition 
with quorum
Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
2 Nodes configured, 2 expected votes
5 Resources configured.


Online: [ sipserver2 ]
OFFLINE: [ sipserver1 ]

 Clone Set: clonePing
 Started: [ sipserver2 ]
 Stopped: [ resPing:0 ]
 asterisk   (ocf::heartbeat:asterisk):  Started sipserver2
 virtual_IPaddr (ocf::heartbeat:IPaddr2):   Started sipserver2
 Clone Set: cloneOpenser
 Started: [ sipserver2 ]
 Stopped: [ openser:0 ]
 Clone Set: cloneMysql
 Started: [ sipserver2 ]
 Stopped: [ mysql:0 ]



= CONFIG 

node $id=336b9a65-615b-4ae0-9c54-de46fafc478a sipserver1 \
attributes standby=off
node $id=b694d28c-41e6-4fdd-bfc1-ff097b5a9349 sipserver2 \
attributes standby=off
primitive asterisk ocf:heartbeat:asterisk \
op monitor interval=10s timeout=20s depth=0 \
meta target-role=Started
primitive openser ocf:heartbeat:openser \
op monitor interval=10s timeout=20s depth=0
primitive resPing ocf:pacemaker:ping \
params host_list=192.168.210.156 multiplier=10 dampen=5s \
op monitor interval=10 timeout=10
primitive virtual_IPaddr ocf:heartbeat:IPaddr2 \
params ip=192.168.210.248 nic=bond0 \
op monitor interval=5s timeout=20s depth=0 \
meta target-role=Started
clone cloneOpenser openser
clone clonePing resPing \
meta globally-unique=false
location IPrunWhenConn virtual_IPaddr \
rule $id=IPrunWhenConn-rule -inf: not_defined pingd or pingd lte 
0
location openserRunWhenConn openser \
rule $id=openserRunWhenConn-rule -inf: not_defined pingd or 
pingd lte 0
location runWhenConn asterisk \
rule $id=runWhenConn-rule -inf: not_defined pingd or pingd lte 0
order asterisk_after_ip inf: virtual_IPaddr asterisk
property $id=cib-bootstrap-options \
dc-version=1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd \
cluster-infrastructure=Heartbeat \
is-managed-default=true \
stonith-enabled=FALSE \
no-quorum-policy=ignore \
expected-quorum-votes=2
rsc_defaults $id=rsc-options \
resource-stickiness=INFINITY
ADVERTENCIA 

Este mensaje y/o sus anexos, pueden contener información personal y 
confidencial cuyo uso, reproducción o distribución no autorizados están 
legalmente prohibidos. Por lo tanto, si Vd. no fuera su destinatario y, 
erróneamente, lo hubiera recibido, le rogamos que informe al remitente y 
lo borre de inmediato.

En cumplimiento de la Ley Orgánica 15/1999, de Protección de Datos de 
Carácter Personal le informamos de que su dirección de correo electrónico, 
así como sus datos personales y de empresa pasarán a formar parte de 
nuestro fichero de Gestión, y serán tratados con la única finalidad de 
mantenimiento de la relación adquirida con usted. Los datos personales que 
existen en nuestro poder están protegidos por nuestra Política de 
Seguridad, y no serán compartidos con ninguna otra empresa. Usted puede 
ejercitar los derechos de acceso, rectificación, cancelación y oposición 
dirigiéndose por escrito a la dirección arriba indicada.

This e-mail and its attachments may include confidential personal 
information which may be protected by any legal rules and cannot be used, 
copied, distributed or disclosed to any person without authorisation. If 
you are not the intended recipient and have received this e-mail by 
mistake, please advise the sender and erase it.

In compliance with the Spanish Organic Act 15/1999 on Personal Data 
Protection, we hereby inform you that your email address, as well as your 
personal and business