Re: [Pacemaker] want resource A to start successfully (completely) then start B
On Mon, May 31, 2010 at 5:40 PM, Aaditya kumar passion.for.syst...@gmail.com wrote: Hi all, AFAIK in resource ordering order(A,B) start A is invoked before start B. BUT my situation demands that NOT ONLY start A is invoked before start B BUT ALSO start A complete successfully (return rc=0) then and only then lrmd should invoke start B. In my logs the resources are started as they should be BUT second resource dosent WAIT FOR first resource's start operation to return . Seriously, enough of the ALL CAPS. I am using mysql on nfs mount . nfs mount is invoked before mysql BUT mysql dosent wait for nfs mount to finish and mysql dosent start (rc = 6 , resource not configured, coz the data dir is nfs mount which is not mounted yet) Please include your configuration. IS THERE A WAY to WAIT for first successful start before starting second resource ? May 31 20:31:16 localhost crmd: [14581]: info: te_pseudo_action: Pseudo action 3 fired and confirmed May 31 20:31:16 localhost crmd: [14581]: info: te_rsc_command: Initiating action 9: start failover-ip_start_0 on aadityaxcat2 (local) May 31 20:31:16 localhost crmd: [14581]: info: do_lrm_rsc_op: Performing key=9:5:0:800549c5-d049-45bf-9987-68423a7a95c4 op=failover-ip_start_0 ) May 31 20:31:16 localhost lrmd: [14578]: info: rsc:failover-ip:26: start May 31 20:31:16 localhost crmd: [14581]: info: te_rsc_command: Initiating action 11: start xcat_ha_start_0 on aadityaxcat2 (local) May 31 20:31:16 localhost crmd: [14581]: info: do_lrm_rsc_op: Performing key=11:5:0:800549c5-d049-45bf-9987-68423a7a95c4 op=xcat_ha_start_0 ) May 31 20:31:16 localhost lrmd: [14578]: info: rsc:xcat_ha:27: start May 31 20:31:16 localhost crmd: [14581]: info: te_rsc_command: Initiating action 13: start nfs_ha_start_0 on aadityaxcat2 (local) May 31 20:31:16 localhost crmd: [14581]: info: do_lrm_rsc_op: Performing key=13:5:0:800549c5-d049-45bf-9987-68423a7a95c4 op=nfs_ha_start_0 ) May 31 20:31:16 localhost lrmd: [14578]: info: rsc:nfs_ha:28: start May 31 20:31:16 localhost crmd: [14581]: info: te_rsc_command: Initiating action 17: start mysql_start_0 on aadityaxcat2 (local) May 31 20:31:16 localhost crmd: [14581]: info: do_lrm_rsc_op: Performing key=17:5:0:800549c5-d049-45bf-9987-68423a7a95c4 op=mysql_start_0 ) May 31 20:31:16 localhost lrmd: [14578]: info: rsc:mysql:29: start May 31 20:31:16 localhost crmd: [14581]: info: te_rsc_command: Initiating action 19: start apache_start_0 on aadityaxcat2 (local) May 31 20:31:16 localhost mysql[16473]: ERROR: Datadir /mnt/nfs/mysql doesn't exist May 31 20:31:16 localhost crmd: [14581]: info: do_lrm_rsc_op: Performing key=19:5:0:800549c5-d049-45bf-9987-68423a7a95c4 op=apache_start_0 ) May 31 20:31:16 localhost pengine: [15590]: WARN: process_pe_message: Transition 5: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/pengine/pe-warn-96712.bz2 May 31 20:31:16 localhost pengine: [15590]: info: process_pe_message: Configuration WARNINGs found during PE processing. Please run crm_verify -L to identify issues. May 31 20:31:16 localhost IPaddr[16468]: INFO: Using calculated netmask for 10.0.0.5: 255.0.0.0 May 31 20:31:16 localhost Filesystem[16470]: INFO: Running start for unicluster:/nfs on /mnt/nfs May 31 20:31:16 localhost lrmd: [14578]: info: RA output: (xcat_ha:start:stdout) Xcatd starting ... May 31 20:31:16 localhost crmd: [14581]: info: process_lrm_event: LRM operation mysql_start_0 (call=29, rc=6, cib-update=93, confirmed=true) not configured May 31 20:31:16 localhost crmd: [14581]: WARN: status_from_rc: Action 17 (mysql_start_0) on aadityaxcat2 failed (target: 0 vs. rc: 6): Error May 31 20:31:16 localhost IPaddr[16468]: INFO: eval ifconfig eth0:0 10.0.0.5 netmask 255.0.0.0 broadcast 10.255.255.255 May 31 20:31:16 localhost crmd: [14581]: WARN: update_failcount: Updating failcount for mysql on aadityaxcat2 after failed start: rc=6 (update=value++, time=1275318076) May 31 20:31:16 localhost crmd: [14581]: info: abort_transition_graph: match_graph_event:272 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=mysql_start_0, magic=0:6;17:5:0:800549c5-d049-45bf-9987-68423a7a95c4, cib=12.187.43) : Event failed May 31 20:31:16 localhost crmd: [14581]: info: update_abort_priority: Abort priority upgraded from 0 to 1 May 31 20:31:16 localhost crmd: [14581]: info: update_abort_priority: Abort action done superceeded by restart May 31 20:31:16 localhost crmd: [14581]: info: match_graph_event: Action mysql_start_0 (17) confirmed on aadityaxcat2 (rc=4) May 31 20:31:16 localhost attrd: [14580]: info: find_hash_entry: Creating hash entry for fail-count-mysql May 31 20:31:16 localhost attrd: [14580]: info: attrd_local_callback: Expanded fail-count-mysql=value++ to 1 -- Regards , Aaditya. I want to change the world ,But God won't give me the source code. ___ Pacemaker mailing list:
Re: [Pacemaker] mounting gfs2
On Mon, May 31, 2010 at 9:45 AM, marc genou marcge...@gmail.com wrote: Hi I am trying to deploy an Active/active cluster but I found some troubles. When I try to mount a gfs2 filesystem in top of drbd I got this error: gfs_controld join connect error: Connection refused error mounting lockproto lock_dlm I am using experimental packages gfs-pcmk/dlm-pcmk 3.0.11 in Debian Squeeze. Any ideas? Are you running heartbeat or corosync (with openais)? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] mounting gfs2
Am Montag, 31. Mai 2010, um 09:45:26 schrieb marc genou: Hi I am trying to deploy an Active/active cluster but I found some troubles. When I try to mount a gfs2 filesystem in top of drbd I got this error: gfs_controld join connect error: Connection refused error mounting lockproto lock_dlm I am using experimental packages gfs-pcmk/dlm-pcmk 3.0.11 in Debian Squeeze. Any ideas? Is your dlm_controld.pcmk really getting started? Under certain conditions it tends to segfault when it is started early in the boot process. A patch for this is available and included in redhat-cluster 3.0.12. You might read the thread [Pacemaker] startup problem DLM on ubuntu lucid in the archive. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] mounting gfs2
I'm using heartbeat. ¿Should I try with corosync? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] handle EINTR in sem_wait (pacemaker corosync 1.2.2+ crash)
Hello, I have found the cause of the crash that was occurring only on some deployments. The cause is that sem_wait is interrupted by signal, and the wait operation is not retried (as is customary in posix). Patch attached to fix A big thank you to Vladislav Bogdanov for running the test case and verifying it fixes the problem. Regards -steve Index: logsys.c === --- logsys.c(revision 2915) +++ logsys.c(working copy) @@ -661,7 +661,18 @@ sem_post (logsys_thread_start); for (;;) { dropped = 0; - sem_wait (logsys_print_finished); +retry_sem_wait: + res = sem_wait (logsys_print_finished); + if (res == -1 errno == EINTR) { + goto retry_sem_wait; + } else + if (res == -1) { + /* + * * This case shouldn't happen + * */ + pthread_exit (NULL); + } + logsys_wthread_lock(); if (wthread_should_exit) { ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Both nodes become master
Hi all! I'm installing a system with heartbeat 3.0.3 and Pacemaker 1.0.8 with the configuration showed at the end of the message. Network is bonded (NIC teaming) If I unplug nic cables from the master server the other become inactive (slave) that it's ok. Now, I plug the machine again to the network and some times both nodes become Active/active and only if I restart heartbeat synchronizes again. Thanks in advance Jorge NODE1: Last updated: Tue Jun 1 18:23:02 2010 Stack: Heartbeat Current DC: sipserver1 (336b9a65-615b-4ae0-9c54-de46fafc478a) - partition with quorum Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd 2 Nodes configured, 2 expected votes 5 Resources configured. Online: [ sipserver1 ] OFFLINE: [ sipserver2 ] Clone Set: clonePing Started: [ sipserver1 ] Stopped: [ resPing:1 ] asterisk (ocf::heartbeat:asterisk): Started sipserver1 virtual_IPaddr (ocf::heartbeat:IPaddr2): Started sipserver1 Clone Set: cloneOpenser Started: [ sipserver1 ] Stopped: [ openser:1 ] Clone Set: cloneMysql Started: [ sipserver1 ] Stopped: [ mysql:1 ] NODE2: crm_mon -1 shows : Last updated: Tue Jun 1 18:19:40 2010 Stack: Heartbeat Current DC: sipserver2 (b694d28c-41e6-4fdd-bfc1-ff097b5a9349) - partition with quorum Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd 2 Nodes configured, 2 expected votes 5 Resources configured. Online: [ sipserver2 ] OFFLINE: [ sipserver1 ] Clone Set: clonePing Started: [ sipserver2 ] Stopped: [ resPing:0 ] asterisk (ocf::heartbeat:asterisk): Started sipserver2 virtual_IPaddr (ocf::heartbeat:IPaddr2): Started sipserver2 Clone Set: cloneOpenser Started: [ sipserver2 ] Stopped: [ openser:0 ] Clone Set: cloneMysql Started: [ sipserver2 ] Stopped: [ mysql:0 ] = CONFIG node $id=336b9a65-615b-4ae0-9c54-de46fafc478a sipserver1 \ attributes standby=off node $id=b694d28c-41e6-4fdd-bfc1-ff097b5a9349 sipserver2 \ attributes standby=off primitive asterisk ocf:heartbeat:asterisk \ op monitor interval=10s timeout=20s depth=0 \ meta target-role=Started primitive openser ocf:heartbeat:openser \ op monitor interval=10s timeout=20s depth=0 primitive resPing ocf:pacemaker:ping \ params host_list=192.168.210.156 multiplier=10 dampen=5s \ op monitor interval=10 timeout=10 primitive virtual_IPaddr ocf:heartbeat:IPaddr2 \ params ip=192.168.210.248 nic=bond0 \ op monitor interval=5s timeout=20s depth=0 \ meta target-role=Started clone cloneOpenser openser clone clonePing resPing \ meta globally-unique=false location IPrunWhenConn virtual_IPaddr \ rule $id=IPrunWhenConn-rule -inf: not_defined pingd or pingd lte 0 location openserRunWhenConn openser \ rule $id=openserRunWhenConn-rule -inf: not_defined pingd or pingd lte 0 location runWhenConn asterisk \ rule $id=runWhenConn-rule -inf: not_defined pingd or pingd lte 0 order asterisk_after_ip inf: virtual_IPaddr asterisk property $id=cib-bootstrap-options \ dc-version=1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd \ cluster-infrastructure=Heartbeat \ is-managed-default=true \ stonith-enabled=FALSE \ no-quorum-policy=ignore \ expected-quorum-votes=2 rsc_defaults $id=rsc-options \ resource-stickiness=INFINITY ADVERTENCIA Este mensaje y/o sus anexos, pueden contener información personal y confidencial cuyo uso, reproducción o distribución no autorizados están legalmente prohibidos. Por lo tanto, si Vd. no fuera su destinatario y, erróneamente, lo hubiera recibido, le rogamos que informe al remitente y lo borre de inmediato. En cumplimiento de la Ley Orgánica 15/1999, de Protección de Datos de Carácter Personal le informamos de que su dirección de correo electrónico, así como sus datos personales y de empresa pasarán a formar parte de nuestro fichero de Gestión, y serán tratados con la única finalidad de mantenimiento de la relación adquirida con usted. Los datos personales que existen en nuestro poder están protegidos por nuestra Política de Seguridad, y no serán compartidos con ninguna otra empresa. Usted puede ejercitar los derechos de acceso, rectificación, cancelación y oposición dirigiéndose por escrito a la dirección arriba indicada. This e-mail and its attachments may include confidential personal information which may be protected by any legal rules and cannot be used, copied, distributed or disclosed to any person without authorisation. If you are not the intended recipient and have received this e-mail by mistake, please advise the sender and erase it. In compliance with the Spanish Organic Act 15/1999 on Personal Data Protection, we hereby inform you that your email address, as well as your personal and business