[Pacemaker] cannot register service of pacemaker_remote

2013-04-29 Thread nozawat
Hi

 Because there was typo in pacemaker.spec.in, I was not able to register
service of pacemaker_remote.

-
diff --git a/pacemaker.spec.in b/pacemaker.spec.in
index 10296a5..1e1fd6d 100644
--- a/pacemaker.spec.in
+++ b/pacemaker.spec.in
@@ -404,7 +404,7 @@ exit 0
 %if %{defined _unitdir}
 /bin/systemctl daemon-reload >/dev/null 2>&1 || :
 %endif
-/sbin/chkconfig --add pacemaker-remote || :
+/sbin/chkconfig --add pacemaker_remote || :

 %preun -n %{name}-remote
 if [ $1 -eq 0 ]; then
-

Regards,
Tomo
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two node KVM cluster

2013-04-29 Thread Andrew Beekhof

On 17/04/2013, at 4:02 PM, Oriol Mula-Valls  wrote:

> On 16/04/13 06:10, Andrew Beekhof wrote:
>> 
>> On 10/04/2013, at 3:20 PM, Oriol Mula-Valls  wrote:
>> 
>>> On 10/04/13 02:10, Andrew Beekhof wrote:
 
 On 09/04/2013, at 7:31 PM, Oriol Mula-Valls   
 wrote:
 
> Thanks Andrew I've managed to set up the system and currently I have it 
> working but still on testing.
> 
> I have configure external/ipmi as fencing device and then I force a 
> reboot doing a echo b>   /proc/sysrq-trigger. The fencing is working 
> properly as the node is shut off and the VM migrated. However, as soon as 
> I turn on the fenced now and the OS has started the surviving is shut 
> down. Is it normal or am I doing something wrong?
 
 Can you clarify "turn on the fenced"?
 
>>> 
>>> To restart the fenced node I do either a power on with ipmitool or I power 
>>> it on using the iRMC web interface.
>> 
>> Oh, "fenced now" was meant to be "fenced node".  That makes more sense now :)
>> 
>> To answer your question, I would not expect the surviving node to be fenced 
>> when the previous node returns.
>> The network between the two is still functional?
> 
> Sorry I didn't not realised the mistake even  while writing the answer :)
> 
> IPMI network is still working between the nodes.

Ok, but what about the network corosync is using?

> 
> Thanks,
> Oriol
> 
>> 
>>> 
> 
> On the other hand I've seen that in case I completely lose power fencing 
> obviously fails. Would SBD stonith solve this issue?
> 
> Kind regards,
> Oriol
> 
> On 08/04/13 04:11, Andrew Beekhof wrote:
>> 
>> On 03/04/2013, at 9:15 PM, Oriol Mula-Valls
>> wrote:
>> 
>>> Hi,
>>> 
>>> I've started with Linux HA about one year ago. Currently I'm facing a 
>>> new project in which I have to set up two nodes with high available 
>>> virtual machines. I have used as a starting point the Digimer's 
>>> tutorial (https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial).
>>> 
>>> To deploy this new infrastructure I have two Fujitsu Primergy Rx100S7. 
>>> Both machines have 8GB of RAM and 2x500GB HD. I started creating a 
>>> software RAID1 with the internal drives and installing Debian 7.0 
>>> (Wheezy). Apart from the O.S. partition I have created 3 more 
>>> partitions, one for the shared storage between both machines with OCFS2 
>>> and the two other will be used as PVs to create LVs to support the VMs 
>>> (one for the VMs that will be primary on node1 an the other for primary 
>>> machines on node2). These 3 partitions are replicated using DRBD.
>>> 
>>> The shared storage folder contains:
>>> * ISO images needed when provisioning VMs
>>> * scripts used to call virt-install which handles the creation of our 
>>> VMs.
>>> * XML definition files which define the emulated hardware backing the 
>>> VMs
>>> * old copies of the XML definition files.
>>> 
>>> I have more or less done the configuration for the OCFS2 fs and I was 
>>> about to start the configuration of cLVM for one of the VGs but I have 
>>> some doubts. I have one dlm for the OCFS2 filesystem, should I create 
>>> another for cLVM RA?
>> 
>> No, there should only ever be one dlm resource (cloned like you have it)
>> 
>>> 
>>> This is the current configuration:
>>> node node1
>>> node node2
>>> primitive p_dlm_controld ocf:pacemaker:controld \
>>> op start interval="0" timeout="90" \
>>> op stop interval="0" timeout="100" \
>>> op monitor interval="10"
>>> primitive p_drbd_shared ocf:linbit:drbd \
>>> params drbd_resource="shared" \
>>> op monitor interval="10" role="Master" timeout="20" \
>>> op monitor interval="20" role="Slave" timeout="20" \
>>> op start interval="0" timeout="240s" \
>>> op stop interval="0" timeout="120s"
>>> primitive p_drbd_vm_1 ocf:linbit:drbd \
>>> params drbd_resource="vm_1" \
>>> op monitor interval="10" role="Master" timeout="20" \
>>> op monitor interval="20" role="Slave" timeout="20" \
>>> op start interval="0" timeout="240s" \
>>> op stop interval="0" timeout="120s"
>>> primitive p_fs_shared ocf:heartbeat:Filesystem \
>>> params device="/dev/drbd/by-res/shared" directory="/shared" 
>>> fstype="ocfs2" \
>>> meta target-role="Started" \
>>> op monitor interval="10"
>>> primitive p_ipmi_node1 stonith:external/ipmi \
>>> params hostname="node1" userid="admin" passwd="xxx" 
>>> ipaddr="10.0.0.2" interface="lanplus"
>>> primitive p_ipmi_node2 stonith:external/ipmi \
>>> params hostname="node2" userid="admin" passwd="xxx" 
>>> ipaddr="10.0.0.3" interface="lanplus"
>>> primitive p_libvirt

Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-04-29 Thread Andrew Beekhof

On 24/04/2013, at 7:44 PM, Rainer Brestan  wrote:

> Pacemaker log of int2node2 with trace setting.
> https://www.dropbox.com/s/04ciy2g6dfbauxy/pacemaker.log?n=165978094
> On int2node1 (1.1.7) the trace setting did not create the pacemaker.log file.
>  

Ah, yes, 1.1.7 wasn't so smart yet.
Can you make sure there is a logfile specified in corosync.conf?
Looking at the node2 logs was useful (nothing is arriving from node1) but I 
really need to see node1's logs.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker installed to custom location

2013-04-29 Thread Andrew Beekhof

On 26/04/2013, at 9:12 PM, James Masson  wrote:

> 
> 
> On 26/04/13 01:29, Andrew Beekhof wrote:
>> 
>> On 26/04/2013, at 12:12 AM, James Masson  wrote:
>> 
>>> 
>>> Hi list,
>>> 
>>> I'm trying to build and run pacemaker from a custom location.
>>> 
>>> Corosync starts up fine.
>>> 
>>> Pacemakerd does not - the result is.
>> 
>> Try turning up the debug to see why the cib isn't happy:
>> 
>>> Apr 25 13:54:10 [10482] fcde02a2-cc41-4c58-b6d2-b7bb0bada436 pacemakerd:
>>> error: pcmk_child_exit: Child process cib exited (pid=10484, rc=100)
>>> Apr 25 13:54:10 [10482] fcde02a2-cc41-4c58-b6d2-b7bb0bada436 pacemakerd:  
>>> warning: pcmk_child_exit: Pacemaker child process cib no longer
>> 
>> 
>> 
> Hi Andrew,
> 
> debug log + strace are attached. The strace has something interesting...
> 
> 
> 5195  open("/dev/shm/qb-cpg-request-5173-5195-19-header", O_RDWR) = -1 EACCES 
> (Permission denied)
> 
> 
> I know pacemaker uses shm to communicate. permissions on /dev/shm are (I 
> think) correct.

Looks reasonable (now that I understand vcap :-)

> 
> root@5627a5e1-9e30-4fe2-9178-6445e26a8ccc:~# ls -al /dev/shm/
> total 8224
> drwxrwx---  2 root vcap  80 2013-04-26 10:30 .
> drwxr-xr-x 12 root root3900 2013-04-26 08:23 ..
> -rw---  1 root root 8388608 2013-04-26 10:30 qb-corosync-blackbox-data
> -rw---  1 root root8248 2013-04-26 10:28 qb-corosync-blackbox-header
> 
> When I changed permissions on /dev/shm to 777 - things get a little further - 
> CIB stays up, crmd respawns, and I get this over and over again in the logs.
> 
> ##
> Apr 26 10:55:52 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc   lrmd: 
> info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:55:54 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc   lrmd: 
> info: crm_client_new:   Connecting 0x1a498e0 for uid=1000 gid=0 pid=5775 
> id=95b6eca5-a34e-49e5-b0f8-74b84857d690
> Apr 26 10:55:54 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc   lrmd: 
> info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:55:56 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc   lrmd: 
> info: crm_client_new:   Connecting 0x1a498e0 for uid=1000 gid=0 pid=5775 
> id=117e515b-da4d-4842-9414-7b7d004e5c92
> Apr 26 10:55:56 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc   lrmd: 
> info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc   lrmd: 
> info: crm_client_new:   Connecting 0x1a498e0 for uid=1000 gid=0 pid=5775 
> id=cf7c10b1-14a1-47d1-9e2e-30707254256f
> Apr 26 10:55:58 [5758] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc   lrmd: 
> info: crm_client_destroy:   Destroying 0 events
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:
> error: pcmk_child_exit:  Child process crmd exited (pid=5775, rc=2)

No logs from the crmd?

> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:
> trace: update_node_processes:Empty uname for node 839122954
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:
> debug: update_node_processes:Node 
> 5627a5e1-9e30-4fe2-9178-6445e26a8ccc now has process list: 
> 0012 (was 00111312)
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:
> trace: update_process_clients:   Sending process list to 0 children
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:
> trace: update_process_peers: Sending  uname="5627a5e1-9e30-4fe2-9178-6445e26a8ccc" proclist="1118482"/>
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:   
> notice: pcmk_process_exit:Respawning failed child process: crmd
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd: 
> info: start_child:  Forked child 5789 for process crmd
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:
> trace: update_node_processes:Empty uname for node 839122954
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:
> debug: update_node_processes:Node 
> 5627a5e1-9e30-4fe2-9178-6445e26a8ccc now has process list: 
> 00111312 (was 0012)
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:
> trace: update_process_clients:   Sending process list to 0 children
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:
> trace: update_process_peers: Sending  uname="5627a5e1-9e30-4fe2-9178-6445e26a8ccc" proclist="1118994"/>
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:
> trace: crm_user_lookup:  Cluster user vcap has uid=1000 gid=1000
> Apr 26 10:55:58 [5754] 5627a5e1-9e30-4fe2-9178-6445e26a8ccc pacemakerd:
> trace: mainloop_gio_callback:New message from corosync-cpg[

Re: [Pacemaker] Bind virtual IP resource to service resource running state

2013-04-29 Thread Andrew Beekhof

On 26/04/2013, at 7:34 PM, Forum Registrant  wrote:

> Hi!
> 
> I have 2 node cluster. On each node I have mysql, nginx and php-fpm. Each 
> node have it's own virtual IP. I need this virtual ip to migrate to other 
> node if one of services (mysql/nginx/php-fpm) is down/stopped. How can I do 
> it?

Colocation constraints.

> 
> !!! Scheme:
> ==- Normal situation:
> Node 1 (Core1.Test) 
> MySQL is running
> NginX is running
> Php-Fpm is running
> Core1_IP is on Node 1
> 
> Node 2 (vCore1.Test) 
> MySQL is running
> NginX is running
> Php-Fpm is running
> vCore1_IP is on Node 2
> 
> ==- If some service (for ex. MySQL on Node 1) failed or stopped:
> Node 1 (Core1.Test) 
> MySQL is stopped
> NginX is running
> Php-Fpm is running
> Core1_IP is on Node 2
> 
> Node 2 (vCore1.Test) 
> MySQL is running
> NginX is running
> Php-Fpm is running
> vCore1_IP is on Node 2
> Core1_IP is on Node 2
> 
> !!! My config:
> node Core1.Test \
> attributes standby="off"
> node vCore1.Test\
> attributes standby="off"
> primitive Core1_IP ocf:heartbeat:IPaddr2 \
> params ip="192.168.0.139" nic="bond0"
> primitive P_MYSQL lsb:mysqld \
> op monitor interval="5s" timeout="20s"
> primitive P_NGINX lsb:nginx \
> op monitor interval="5s" timeout="20s"
> primitive P_PHP lsb:php-fpm \
> op monitor interval="5s" timeout="20s"
> primitive vCore1_IP ocf:heartbeat:IPaddr2 \
> params ip="192.168.0.141" nic="bond0"
> clone CL_MYSQL P_MYSQL \
> params clone-max="2" clone-node-max="1" globally-unique="false"
> clone CL_NGINX P_NGINX \
> params clone-max="2" clone-node-max="1" globally-unique="false"
> clone CL_PHP P_PHP \
> params clone-max="2" clone-node-max="1" globally-unique="false"
> location L_MYSQL_01 CL_MYSQL 100: Core1.Test
> location L_MYSQL_02 CL_MYSQL 100: vCore1.Test
> location L_NGINX_01 CL_NGINX 100: Core1.Test
> location L_NGINX_02 CL_NGINX 100: vCore1.Test
> location L_PHP_01 CL_PHP 100: Core1.Test
> location L_PHP_02 CL_PHP 100: vCore1.Test
> location location_Core1_IP Core1_IP inf: Core1.Test
> location location_Core1_IP_2 Core1_IP 10: vCore1.Test
> location location_vCore1_IP vCore1_IP inf: vCore1.Test
> location location_vCore1_IP_2 vCore1_IP 10: Core1.Test
> property $id="cib-bootstrap-options" \
> dc-version="1.1.8-7.el6-394e906" \
> cluster-infrastructure="classic openais (with plugin)" \
> expected-quorum-votes="2" \
> no-quorum-policy="ignore" \
> symmetric-cluster="false" \
> stonith-enabled="false"
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] HA KVM over DRBD primary/secondary configuration

2013-04-29 Thread Andrew Beekhof

On 19/04/2013, at 5:44 PM, Rasto Levrinc  wrote:

> On Fri, Apr 19, 2013 at 9:11 AM, Alexandr A. Alexandrov
>  wrote:
>> Hi Rasto,
>> 
>> Note that on RHEL 6/CentOS 6, you should run the Pacemaker through CMAN and
>> not a Corosync plugin
> 
> I wonder if that's still true,

Yes.  
More so since there was a major bug that affects the plugin in 6.4 which wasn't 
picked up because we don't test that configuration.

> but better be safe than sorry.
> 
>> 
>> 
>> Not glad to hear that... We are using  Pacemaker+Corosync everywhere (SuSe,
>> CentOS, OracleLinux servers).
>> Is there any way to use LCMC in this setup?
>> 
> 
> there's no problem using LCMC in this setup. It will not help to create the
> cman config, though. That's a feature with very low priority for me at the
> moment.
> 
> Rasto
> 
>> 19.04.2013 09:30, Rasto Levrinc пишет:
>> 
>> We used the amazing LCMC tool for the hosts, drbd and pacemaker and Corosync
>> setup
>> http://lcmc.sourceforge.net/
>> 
>> LCMC does lots of the setup automatic - a huge timesaver.
>> 
>> I'm glad to hear that.
>> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_attribute not returning node attribute

2013-04-29 Thread Andrew Beekhof

On 20/04/2013, at 3:39 AM, Brian J. Murrell  wrote:

> Given:
> 
> host1# crm node attribute host1 show foo
> scope=nodes  name=foo value=bar
> 
> Why doesn't this return anything:
> 
> host1# crm_attribute --node host1 --name foo --query

This is looking up transient attributes.
You need to add "-l forever"

> host1# echo $?
> 0
> 
> cibadmin -Q confirms the presence of the attribute:
> 
>  
>
>  
>
>  
> 
> This is on pacemaker 1.1.8 on EL6.4 and crmsh.
> 
> Thoughts?
> 
> b.
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Shared redundant machine

2013-04-29 Thread Andrew Beekhof

On 26/04/2013, at 7:05 PM, Grant Bagdasarian  wrote:

> Hello,
>  
> Let’s say I have three physical servers available to configure in a redundant 
> way. I’m going to use two of the three servers to connect each to a different 
> network. The one that is left over will be the fallback server for both 
> primary servers. What needs to be done in this scenario is that the shared IP 
> and a process running on the primary machines is migrated to the secondary 
> server, in the case one or both servers fail. The same process runs on both 
> primary machines.  
>  
> I’m not sure if this is possible.

It is.

> It was just an idea, and my experience with high availability isn’t that 
> great, aside from configuring a basic cluster and some resources. So I’m 
> asking it here to the more experienced users.
>  
> Thanks,
> 
> Grant
>  
>  
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to display interface link status in corosync

2013-04-29 Thread Andrew Beekhof

On 18/04/2013, at 2:42 PM, Yuichi SEINO  wrote:

> Hi,
> 
> 2013/4/15 Andrew Beekhof :
>> 
>> On 15/04/2013, at 3:38 PM, Yuichi SEINO  wrote:
>> 
>>> Hi,
>>> 
>>> 2013/4/8 Andrew Beekhof :
 I'm not 100% sure what the best approach is here.
 
 Traditionally this is done with resource agents (ie. ClusterMon or ping) 
 which update attrd.
 We could potentially build it into attrd directly, but then we'd need to 
 think about how to turn it on/off.
 
 I think I'd lean towards a new agent+daemon or a new daemon launched by 
 ClusterMon.
>>> I check to see if I implement this function by a new agent+daemon.
>>> I have a question. I am not sure how to launch daemon by ClusterMon.
>>> Do you mean to use "crm_mon -E"?
>> 
>> No. I mean the same way the Apache agent starts httpd.
> apache RA has the parameter which a binary path can specify. So, This
> RA may be able to launch the other daemon.
> However, it seems that ClusterMon doesn't have the parameter which a
> binary path can specify.
> Can you launch the other daemon without using the parameter?


step 1 - write a daemon that talks to attrd
step 2 - patch the start function of the ClusterMon RA to start your new daemon
step 3 - patch the stop function of the ClusterMon RA to stop your new daemon


> 
> Sincerely,
> Yuichi
> 
>> 
>>> 
>>> Sincerely,
>>> Yuichi
>>> 
 
 On 04/04/2013, at 8:59 PM, Yuichi SEINO  wrote:
 
> Hi All,
> 
> I want to display interface link status in corosync. So, I think that
> I will add this function to the part of "pacemakerd".
> I am going to display this status to "Node Attributes"  in crm_mon.
> When the state of link change, corosync can run the callback function.
> When it happens, we update attributes. And, this function need to
> start after "attrd" started. "pacemakerd" of mainloop start after
> sub-process started. So, I think that this is the best timing.
> 
> I show the expected crm_mon.
> 
> # crm_mon -fArc1
> Last updated: Thu Apr  4 08:08:08 2013
> Last change: Wed Apr  3 04:15:48 2013 via crmd on coro-n2
> Stack: corosync
> Current DC: coro-n1 (168427526) - partition with quorum
> Version: 1.1.9-c791037
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> 
> 
> Online: [ coro-n1 coro-n2 ]
> 
> Full list of resources:
> 
> Clone Set: OFclone [openstack-fencing]
>   Started: [ coro-n1 coro-n2 ]
> 
> Node Attributes:
> * Node coro-n1:
>  + ringnumber(0)   : 10.10.0.6 is FAULTY
>  + ringnumber(1)   : 10.20.0.6 is UP
> * Node coro-n2:
>  + ringnumber(0)   : 10.10.0.7 is FAULTY
>  + ringnumber(1)   : 10.20.0.7 is UP
> 
> Migration summary:
> * Node coro-n2:
> * Node coro-n1:
> 
> Tickets:
> 
> 
> Sincerely,
> Yuichi
> 
>>> 
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> --
> Yuichi SEINO
> METROSYSTEMS CORPORATION
> E-mail:seino.clust...@gmail.com
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] resource-stickness-issue

2013-04-29 Thread Andrew Beekhof

On 16/04/2013, at 6:37 PM, ravindra.raut...@wipro.com wrote:

> Hi All,
>   I have created cluster with these versions in fedora 17.
> pacemaker-1.1.7-2.fc17.x86_64
> corosync-2.0.0-1.fc17.x86_64
> 
> Everything is working fine for me except resource stickiness.
> 
> Any idea on this ?

Not without at least a description of the problem.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Routing-Ressources on a 2-Node-Cluster

2013-04-29 Thread Andrew Beekhof

On 23/04/2013, at 6:05 PM, T.  wrote:

> Hi Devin,
> 
> thank you very much for your answer.
> 
>> If you insist on trying to do this with just the Linux-HA cluster,
>> I don't have any suggestions as to how you should proceed.
> I know that the "construct" we are building is quite complicated.
> 
> The problem is, that the active network (10.20.10.x) is too small to
> cover both locations (in real we only have a /26 subnet available) and
> we can not change/move this (public) addresses to an other range.
> 
> In addition a NAT-translation over a router is not possible, the servers
> have to be accessible directly via their public ip-address, that has to
> be the cluster-ip.
> 
> So we have to deal with two different networks in the two locations
> (10.20.11.x/10.20.12.x) and create an "overlay" for the current
> ip-addresses :-(

Did you try corosync's udpu feature?
It's basically the same as ucast from ha.cf but I don't recall if that feature 
made it into 6.4

> 
> 
> The current status is, that I added a modified the
> heartbeat-Route2-script that also allows the "metric" as parameter and
> with this it works as expected.
> 
> But I am really fighting with the new corosync/pacemaker, for me the old
> heartbeat (1) was much easier and gave me all functionality I needed. So
> I make progress, but quite/too slow ...
> -- 
> To Answer please replace "invalid" with "de" !
> Zum Antworten bitte "invalid" durch "de" ersetzen !
> 
> 
> Chau y hasta luego,
> 
> Thorolf
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

2013-04-29 Thread Andrew Beekhof

On 18/04/2013, at 5:58 PM, T.  wrote:

> Hi,
> 
>> Seems appropriate :)
>> 
>>  
>> http://blog.clusterlabs.org/blog/2009/configuring-heartbeat-v1-was-so-simple/
> "…because it couldn’t do anything."
> 
> That might be true, but this was everything I needed the last years ...

You didn't need to know when the application crashed but the node was ok?
In that case, why not just build heartbeat yourself and keep using it?
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] cman + corosync + pacemaker + fence_scsi

2013-04-29 Thread Andrew Beekhof

On 26/04/2013, at 4:25 PM, Angel L. Mateo  wrote:

> El 26/04/13 02:01, Andrew Beekhof escribió:
>> 
>> On 24/04/2013, at 10:48 PM, Angel L. Mateo  wrote:
>> 
>>> Hello,
>>> 
>>> I'm trying to configure a 2 node cluster in ubuntu with cman + corosync 
>>> + pacemaker (the use of cman is because it is recommended at pacemaker 
>>> quickstart). In order to solve the split brain in the 2 node cluster I'm 
>>> using qdisk.
>> 
>> If you want to use qdisk, then you need something newer than 1.1.8 (which 
>> did not know how to filter qdisk from the membership).
>> 
>   Oopps. I have cman 3.1.7, corosync 1.4.2 and pacemaker 1.1.6 (the ones 
> provided with ubuntu 12.04).
> 
>   My purpose for using qdisk is to solve split brain problem in my two 
> nodes cluster. Another suggestion for this?

Another node (with standby=true) might be an option.
Or try and get a newer version.

> 
>> 
>>> For fencing, I'm trying to use fence_scsi and in this point I'm having the 
>>> problem. I have attached my cluster.conf.
>>> 
>>> xml 
>>> node myotis51
>>> node myotis52
>>> primitive cluster_ip ocf:heartbeat:IPaddr2 \
>>> params ip="155.54.211.167" \
>>> op monitor interval="30s"
>>> property $id="cib-bootstrap-options" \
>>> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
>>> cluster-infrastructure="cman" \
>>> stonith-enabled="false" \
>>> last-lrm-refresh="1366803979"
>>> 
>>> At this moment I'm trying just with an IP resource, but at the end I'll 
>>> get LVM resources and dovecot server running in top of them.
>>> 
>>> The problem I have is that whenever I interrupt network traffic between 
>>> my nodes (to check if quorum and fencing is working) the IP resource is 
>>> started in both nodes of the cluster.
>> 
>> Do both side claim to have quorum?
>> Also, had you enabled fencing the cluster would have shot its peer before 
>> trying to start the IP.
>> 
>   I think I did (and this configuration with stonith disabled is because 
> modified for later tests) but I will check it again.
> 
>>> 
>>> So it seems that node fencing configure at cluster.conf is not working 
>>> for me.
>> 
>> Because pacemaker cannot use it from there.
>> You need to follow
>> 
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_configuring_cman_fencing.html
>> 
>> and then teach pacemaker about fence_scsi:
>> 
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/ch09.html
>> 
>>> Then I have tried to configure as a stonith resource (since it is listed by 
>>> sudo crm ra list stonith), so I have tried to include
>>> 
>>> primitive stonith_fence_scsi stonith:redhat/fence_scsi
>>> 
>>> The problem I'm having with this is that I don't know how to indicate 
>>> params for the resource (I have tried params devices="...", params -d ..., 
>>> but they are not accepted) and with this (default) configuration I get:
>> 
>> See the above link to chapter 9.
>> 
>   I have tried this. The problem I'm having is that I don't know how to 
> create the resource using fence_scsi. I have tried different syntaxes
> 
> crm(live)configure# primitive stonith_fence_scsi stonith:redhat/fence_scsi \

Remove the "redhat/" part. 

> > params name="scsi_fence" devices="/dev/sdc"
> ERROR: stonith_fence_scsi: parameter name does not exist

'name' is the name of the machine to be shot and is filled in at runtime 

> ERROR: stonith_fence_scsi: parameter devices does not exist

This looks like crmsh not knowing how to find the agent's metadata and can be 
ignored.

"man fence_scsi" looks like you need a value for "key" though

> 
> crm(live)configure# primitive stonith_fence_scsi stonith:redhat/fence_scsi \
> > params n="scsi_fence" d="/dev/sdc"
> ERROR: stonith_fence_scsi: parameter d does not exist
> ERROR: stonith_fence_scsi: parameter n does not exist
> 
> crm(live)configure# primitive stonith_fence_scsi stonith:redhat/fence_scsi \
> > params -n="scsi_fence" -d="/dev/sdc"
> ERROR: stonith_fence_scsi: parameter -d does not exist
> ERROR: stonith_fence_scsi: parameter -n does not exist
> 
>   Does anyone has an example for this? What I would like to do is that in 
> case of problems, the node with the use of scsi channel (the one using my LMV 
> volumes) shoots the other one. Could I use the same behaviour with 
> external/sbd stonith resource?
> 
> -- 
> Angel L. Mateo Martínez
> Sección de Telemática
> Área de Tecnologías de la Información
> y las Comunicaciones Aplicadas (ATICA)
> http://www.um.es/atica
> Tfo: 868889150
> Fax: 86337
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing

Re: [Pacemaker] Action "unknown exec error" and unmanaged/failed resources, how to migrate?

2013-04-29 Thread Andrew Beekhof

On 29/04/2013, at 2:48 PM, Mark Williams  wrote:

> Hi all,
> 
> My two node cluster (qemu VMs, drbd) is now in quite a messy state.
> 
> The problem started with a unresponsive qemu VM, which appeared to be
> caused by a libvertd problem/bug.
> Others said the solution was to kill & restart libvertd which didnt help.
> To fix the problem, i figured putting the node1 into a clean state via
> server reboot would be the best idea, so i issued a crm standby
> command.

That doesn't initiate a reboot.
It only tells pacemaker to try and stop all the resources running on node1

> 
> I now have node1 in a standby state, but the resources/vm's that were
> (and still are) running on it have a "Master Started  (unmanaged)
> FAILED" state according to crm_mon.

Most likely because they refused to stop and you have no fencing.

> 
> Any actions i try to perform on that node (for example, moving a
> resource to the other node) results in a "unknown exec error".
> 
> I tried using crm_resource -C on a node1 "Started  (unmanaged) FAILED"
> resource, which changed its state to "Master (unmanaged) FAILED" (it
> did shutdown the running qemu VM).
> Trying to move that resource to node2 still fails with "unknown exec error".
> 
> How do i get out of this problem?

Step 1 is provide more information.
Likely from your logs.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker core dumps

2013-04-29 Thread Andrew Beekhof

On 30/04/2013, at 1:32 AM, Xavier Lashmar  wrote:

> Hello Andrew,
> 
> Thanks for your help.  We've upgrade to pacemaker 1.1.9 and still have the 
> same issue.  

Thats a disappointing but useful data point.

> 
> We are trying to get the core information but we are missing some debuginfo 
> files which we are trying to get our hands on.  I'll try to forward this 
> information soon.   

Great

> 
> Is there something we need to do to the CIB when we upgrade?

No, anything that needs to happen will be done under the hood.

> 
> 
> Xavier Lashmar
> Analyste de Systèmes | Systems Analyst
> Service étudiants, service de l'informatique et des communications/Student 
> services, computing and communications services.
> 1 Nicholas Street (810)
> Ottawa ON K1N 7B7
> Tél. | Tel. 613-562-5800 (2120)
>  
> 
> 
> -Original Message-
> From: Andrew Beekhof [mailto:and...@beekhof.net] 
> Sent: Thursday, April 25, 2013 8:15 PM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Pacemaker core dumps
> 
> 
> On 26/04/2013, at 10:06 AM, Andrew Beekhof  wrote:
> 
>> 
>> On 25/04/2013, at 11:59 PM, Xavier Lashmar  wrote:
>> 
>>> Following further investigation, we were able to determine that upgrading 
>>> both nodes (in a two node cluster) from Pacemaker 1.1.7-6 to Pacemaker 
>>> 1.1.8-7 (CentOS 6.3 or Centos 6.4) caused these errors to begin happening:
>> 
>> Would you be able to try the 1.1.9 packages from 
>> http://www.clusterlabs.org/rpm-next to see if they are also affected?
>> 
>>> 
>>> We were able to replicate the initiation of the errors by upgrading another 
>>> cluster in the same manner.  This other cluster is now experiencing the 
>>> same core-dumping and errors as the previous cluster:
>>> 
>>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: Entity: 
>>> line 1: parser error : invalid character in attribute value
>>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: 
>>> a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is 
>>> running...
>>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error:  
>>>   ^
>>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: Entity: 
>>> line 1: parser error : attributes construct error
>>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: 
>>> a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is 
>>> running...
>>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error:  
>>>   ^
>>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: Entity: 
>>> line 1: parser error : Couldn't find end of Start Tag lrmd_notify line 1
>>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: 
>>> a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is 
>>> running...
>>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error:  
>>>   ^
>>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: Entity: 
>>> line 1: parser error : Extra content at the end of the document
>>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: 
>>> a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is 
>>> running...
>>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error:  
>>>   ^
>>> Apr 25 09:46:22  crmd[1764]:  warning: string2xml: Parsing failed 
>>> (domain=1, level=3, code=5): Extra content at the end of the document 
>>> Apr 25 09:46:22  crmd[1764]:  warning: string2xml: String start: 
>>> 
>>> Apr 25 09:46:22  crmd[1764]:error: crm_abort: string2xml: Forked 
>>> child 4182 to record non-fatal assert at xml.c:605 : String parsing error
> 
> Also, it would be very useful if you could open up the core file for 4182 and 
> print the contents of the input passed to string2xml() 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http:

Re: [Pacemaker] warning about pacemakerconfig parameter violate uniqueness for parameter /etc/libvirt/qemu/win01.xml

2013-04-29 Thread Michael Schwartzkopff
Am Montag, 29. April 2013, 20:43:44 schrieb Michael Schwartzkopff:
> Am Montag, 29. April 2013, 18:37:08 schrieb Matchett, John:
> > My two node cluster has  two virtual machines winA and winB which have the
> > config parameter   config="/etc/libvirt/qemu/win01.xml A CRM configure
> > verify yields a warning about  pacemaker violate uniqueness for parameter
> > /etc/libvirt/qemu/win01.xml Also the machines wont stay running.  Could
> > this be related to the warning above? Do I need two separate XML  and qemu
> > files? Ive noticed that 1 of my virtual machines keeps stopping .
> 
> It seems that the resources need two different VM config files. But I don't
> think this is a pacemaker problem, but refers to the libvirt.
> 
> Please make a copy of the XML config file of your VM and change the UUID in
> the second file.

And of course, be sure to use different storage devices for both machines.

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] warning about pacemakerconfig parameter violate uniqueness for parameter /etc/libvirt/qemu/win01.xml

2013-04-29 Thread Michael Schwartzkopff
Am Montag, 29. April 2013, 18:37:08 schrieb Matchett, John:
> My two node cluster has  two virtual machines winA and winB which have the
> config parameter   config="/etc/libvirt/qemu/win01.xml A CRM configure
> verify yields a warning about  pacemaker violate uniqueness for parameter 
> /etc/libvirt/qemu/win01.xml Also the machines wont stay running.  Could
> this be related to the warning above? Do I need two separate XML  and qemu
> files? Ive noticed that 1 of my virtual machines keeps stopping .

It seems that the resources need two different VM config files. But I don't 
think this is a pacemaker problem, but refers to the libvirt.

Please make a copy of the XML config file of your VM and change the UUID in 
the second file.

Greetings,

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] warning about pacemakerconfig parameter violate uniqueness for parameter /etc/libvirt/qemu/win01.xml

2013-04-29 Thread Matchett, John
My two node cluster has  two virtual machines winA and winB which have the 
config parameter   config="/etc/libvirt/qemu/win01.xml
A CRM configure verify yields a warning about  pacemaker violate uniqueness for 
parameter  /etc/libvirt/qemu/win01.xml
Also the machines wont stay running.  Could this be related to the warning 
above? Do I need two separate XML  and qemu files?
Ive noticed that 1 of my virtual machines keeps stopping .

The CRM config snippet is below:

primitive p_ldirectord lsb:ldirectord \
op monitor interval="40s" timeout="20s" \
meta target-role="Started"
primitive p_ldirectord_ip ocf:heartbeat:IPaddr2 \
params cidr_netmask="24" ip="192.168.155.21" iflabel="ldirectord" \
op monitor interval="20s" timeout="10s"
primitive p_vm_winA ocf:heartbeat:VirtualDomain \
params config="/etc/libvirt/qemu/win01.xml" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="300" \
op monitor interval="60s" timeout="50s" start-delay="0" \
meta target-role="Started" \
utilization cpu="4" hv_memory="2048"
primitive p_vm_winB ocf:heartbeat:VirtualDomain \
params config="/etc/libvirt/qemu/win01.xml" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="300" \
op monitor interval="60s" timeout="50s" start-delay="0" \
meta target-role="Started" \
utilization cpu="4" hv_memory="2048"

location l_vm_winA p_vm_winA -inf: lm08.test.com
location l_vm_winB p_vm_winA -inf: lm07.test.com


Thanks
JohnM

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker core dumps

2013-04-29 Thread Xavier Lashmar
Hello Andrew,

Thanks for your help.  We've upgrade to pacemaker 1.1.9 and still have the same 
issue.  

We are trying to get the core information but we are missing some debuginfo 
files which we are trying to get our hands on.  I'll try to forward this 
information soon.   

Is there something we need to do to the CIB when we upgrade?


Xavier Lashmar
Analyste de Systèmes | Systems Analyst
Service étudiants, service de l'informatique et des communications/Student 
services, computing and communications services.
1 Nicholas Street (810)
Ottawa ON K1N 7B7
Tél. | Tel. 613-562-5800 (2120)
 


-Original Message-
From: Andrew Beekhof [mailto:and...@beekhof.net] 
Sent: Thursday, April 25, 2013 8:15 PM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Pacemaker core dumps


On 26/04/2013, at 10:06 AM, Andrew Beekhof  wrote:

> 
> On 25/04/2013, at 11:59 PM, Xavier Lashmar  wrote:
> 
>> Following further investigation, we were able to determine that upgrading 
>> both nodes (in a two node cluster) from Pacemaker 1.1.7-6 to Pacemaker 
>> 1.1.8-7 (CentOS 6.3 or Centos 6.4) caused these errors to begin happening:
> 
> Would you be able to try the 1.1.9 packages from 
> http://www.clusterlabs.org/rpm-next to see if they are also affected?
> 
>> 
>> We were able to replicate the initiation of the errors by upgrading another 
>> cluster in the same manner.  This other cluster is now experiencing the same 
>> core-dumping and errors as the previous cluster:
>> 
>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: Entity: 
>> line 1: parser error : invalid character in attribute value
>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: 
>> a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is 
>> running...
>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error:   
>>  ^
>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: Entity: 
>> line 1: parser error : attributes construct error
>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: 
>> a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is 
>> running...
>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error:   
>>  ^
>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: Entity: 
>> line 1: parser error : Couldn't find end of Start Tag lrmd_notify line 1
>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: 
>> a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is 
>> running...
>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error:   
>>  ^
>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: Entity: 
>> line 1: parser error : Extra content at the end of the document
>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error: 
>> a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is 
>> running...
>> Apr 25 09:46:22  crmd[1764]:error: crm_xml_err: XML Error:   
>>  ^
>> Apr 25 09:46:22  crmd[1764]:  warning: string2xml: Parsing failed 
>> (domain=1, level=3, code=5): Extra content at the end of the document 
>> Apr 25 09:46:22  crmd[1764]:  warning: string2xml: String start: 
>> 
>> Apr 25 09:46:22  crmd[1764]:error: crm_abort: string2xml: Forked 
>> child 4182 to record non-fatal assert at xml.c:605 : String parsing error

Also, it would be very useful if you could open up the core file for 4182 and 
print the contents of the input passed to string2xml() 
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org