Re: [ClusterLabs] Different pacemaker versions split cluster

2016-06-08 Thread Ken Gaillot
On 06/07/2016 02:26 PM, DacioMF wrote:
> Ken,
> 
> I clear all logs in /var/log/corosync and reboot the cluster (this is the 
> test environment, but i want to upgrade the production).
> 
> I attach the output of the command crm_report --from "2016-06-07 0:0:0" after 
> the reboot.
> 
> The corosync and pacemaker versions on Ubuntu 16.04 is 2.3.5 and 1.1.14
> 
> The corosync and pacemaker versions on Ubuntu 14.04 is 2.3.3 and 1.1.10
> 
> 
>  DacioMF Analista de Redes e Infraestrutura

This isn't causing your issue, but when running a mixed-version cluster,
it's essential that a node running the oldest version is elected DC. You
can ensure that by always booting and starting the cluster on it first.
See http://blog.clusterlabs.org/blog/2013/mixing-pacemaker-versions

In this case, we're not getting that far, because the nodes aren't
talking to each other.

The corosync.quorum output shows that everything's fine at the cluster
membership level. This can also be seen in the live CIB where
in_ccm="true" for all nodes (indicating membership), but crmd="offline"
for the different-version nodes (indicating broken pacemaker communication).

In the logs, we can see "state is now member" for all four nodes, but
pcmk_cpg_membership only sees the nodes with the same version.

I suspect the problem is in corosync's cpg handling, since
pcmk_cpg_membership logs everything it gets from corosync. I'm not
familiar with any relevant changes between 2.3.3 and 2.3.5, so I'm not
sure what's going wrong.

> 
> 
> Em Segunda-feira, 6 de Junho de 2016 17:30, Ken Gaillot  
> escreveu:
> On 05/30/2016 01:14 PM, DacioMF wrote:
>> Hi,
>>
>> I had 4 nodes with Ubuntu 14.04LTS in my cluster and all of then worked 
>> well. I need upgrade all my cluster nodes to Ubuntu 16.04LTS without stop my 
>> resources. Two nodes have been updated to 16.04 and the two others remains 
>> with 14.04. The problem is that my cluster was splited and the nodes with 
>> Ubuntu 14.04 only work with the other in the same version. The same is true 
>> for the nodes with Ubuntu 16.04. The feature set of pacemaker in Ubuntu 
>> 14.04 is v3.0.7 and in 16.04 is v3.0.10.
>>
>> The following commands shows what's happening:
>>
>> root@xenserver50:/var/log/corosync# crm status
>> Last updated: Thu May 19 17:19:06 2016
>> Last change: Thu May 19 09:00:48 2016 via cibadmin on xenserver50
>> Stack: corosync
>> Current DC: xenserver51 (51) - partition with quorum
>> Version: 1.1.10-42f2063
>> 4 Nodes configured
>> 4 Resources configured
>>
>> Online: [ xenserver50 xenserver51 ]
>> OFFLINE: [ xenserver52 xenserver54 ]
>>
>> -
>>
>> root@xenserver52:/var/log/corosync# crm status
>> Last updated: Thu May 19 17:20:04 2016Last change: Thu May 19 
>> 08:54:57 2016 by hacluster via crmd on xenserver54
>> Stack: corosync
>> Current DC: xenserver52 (version 1.1.14-70404b0) - partition with quorum
>> 4 nodes and 4 resources configured
>>
>> Online: [ xenserver52 xenserver54 ]
>> OFFLINE: [ xenserver50 xenserver51 ]
>>
>> xenserver52 and xenserver54 are Ubuntu 16.04 the others are Ubuntu 14.04.
>>
>> Someone knows what's the problem?
>>
>> Sorry by my poor english.
>>
>> Best regards,
>>  DacioMF Analista de Redes e Infraestrutura
> 
> 
> Hi,
> 
> We aim for backward compatibility, so this likely is a bug. Can you
> attach the output of crm_report from around this time?
> 
>   crm_report --from "-M-D H:M:S" --to "-M-D H:M:S"
> 
> FYI, you cannot do a rolling upgrade from corosync 1 to corosync 2, but
> I believe both 14.04 and 16.04 use corosync 2.
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] dovecot RA

2016-06-08 Thread Ken Gaillot
On 06/08/2016 09:11 AM, Ken Gaillot wrote:
> On 06/08/2016 03:26 AM, Jan Pokorný wrote:
>> On 07/06/16 14:48 -0500, Dimitri Maziuk wrote:
>>> next question: I'm on centos 7 and there's no more /etc/init.d/>> anything>. With lennartware spreading, is there a coherent plan to deal
>>> with former LSB agents?
>>
>> Pacemaker can drive systemd-managed services for quite some time.
> 
> This is as easy as changing lsb:dovecot to systemd:dovecot.
> 
> Or, if you specify it as service:dovecot, Pacemaker will check whether
> LSB, systemd or upstart is used on the local system, and call the
> appropriate one.
> 
> As with LSB, don't enable systemd-managed services to start at boot, if
> you want the cluster to manage them.
> 
> One issue that sometimes comes up: some scripts (some logrotate conf
> files or cron jobs, for example) will call "systemctl reload
> ". If the service is managed by the cluster, systemd
> doesn't think it's running, so the reload will fail. You have to replace
> such lines with a native reload mechanism for the service.

Whoops -- I was thinking of when an OCF agent is used. If you use
systemd: or service:, systemd does know the service is running, so
systemctl reload/status will work just fine.

>> Provided that the project/daemon you care about carries the unit
>> file, you can use that unless there are distinguished roles for the
>> provided service within the cluster (like primary+replicas), there's
>> a need to run multiple varying instances of the same service,
>> or other cluster-specific features are desired.
>>
>> For dovecot, I can see:
>> # rpm -ql dovecot | grep \.service
>> /usr/lib/systemd/system/dovecot.service 
>>
>>> Specifically, should I roll my own RA for dovecot or is there one in the
>>> works somewhere?
>>
>> If you miss something with the generic approach per above, and there's
>> no fitting open-sourced RA around then it's probably your last resort.
>>
>> For instance, there was once an agent written in C (highly unusual),
>> but seems abandoned a long time ago:
>> https://github.com/perrit/dovecot-ocf-resource-agent

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker_remoted XML parse error

2016-06-08 Thread Narayanamoorthy Srinivasan
No recent network changes. Will check for abnormal traffic using wireshark.

I also notice that the XML lines are partial (no ending '>', closing " and
sometimes partial words) in logs. Any lines > 472 characters are truncated
to 472 characters. Wondering is it due to anyother limitations.

I can post some line tomorrow when i am back to work.


On Wed, Jun 8, 2016 at 8:00 PM, Ken Gaillot  wrote:

> On 06/08/2016 06:14 AM, Narayanamoorthy Srinivasan wrote:
> > I have a pacemaker cluster with two pacemaker remote nodes. Recently the
> > remote nodes started throwing below errors and SDB started self-fencing.
> > Appreciate if someone throws light on what could be the issue and the
> fix.
> >
> > OS - SLES 12 SP1
> > Pacemaker Remote version - pacemaker-remote-1.1.13-14.7.x86_64
> >
> > 2016-06-08T14:11:46.009073+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> > error : AttValue: ' expected
> > 2016-06-08T14:11:46.009314+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error:
> > key="neutron-ha-tool_monitor_0" operation="monitor"
> > crm-debug-origin="do_update_
> > 2016-06-08T14:11:46.009443+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error:
> >  ^
> > 2016-06-08T14:11:46.009567+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> > error : attributes construct error
> > 2016-06-08T14:11:46.009697+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error:
> > key="neutron-ha-tool_monitor_0" operation="monitor"
> > crm-debug-origin="do_update_
> > 2016-06-08T14:11:46.009824+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error:
> >  ^
> > 2016-06-08T14:11:46.009948+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> > error : Couldn't find end of Start Tag lrm_rsc_op line 1
> > 2016-06-08T14:11:46.010070+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error:
> > key="neutron-ha-tool_monitor_0" operation="monitor"
> > crm-debug-origin="do_update_
> > 2016-06-08T14:11:46.010191+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error:
> >  ^
> > 2016-06-08T14:11:46.010460+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> > error : Premature end of data in tag lrm_resource line 1
> > 2016-06-08T14:11:46.010718+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error:
> > key="neutron-ha-tool_monitor_0" operation="monitor"
> > crm-debug-origin="do_update_
> > 2016-06-08T14:11:46.010977+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error:
> >  ^
> > 2016-06-08T14:11:46.011234+05:30 d18-fb-7b-18-f1-8e
> > pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> > error : Premature end of data in tag lrm_resources line 1
> >
> >
> > --
> > Thanks & Regards
> > Moorthy
>
> This sounds like the network traffic between the cluster nodes and the
> remote nodes is being corrupted. Have there been any network changes
> lately? Switch/firewall/etc. equipment/settings? MTU?
>
> You could try using a packet sniffer such as wireshark to see if the
> traffic looks abnormal in some way. The payload is XML so it should be
> more or less readable.
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Thanks & Regards
Moorthy
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Minimum configuration for dynamically adding a node to a cluster

2016-06-08 Thread Ferenc Wágner
Nikhil Utane  writes:

> Would like to know the best and easiest way to add a new node to an already
> running cluster.
>
> Our limitation:
> 1) pcsd cannot be used since (as per my understanding) it communicates over
> ssh which is prevented.
> 2) No manual editing of corosync.conf

If you use IPv4 multicast for Corosync 2 communication, then you needn't
have a nodelist in corosync.conf.  However, if you want a quorum
provider, then expected_votes must be set correctly, otherwise a small
partition booting up could mistakenly assume it has quorum.  In a live
system all corosync daemons will recognize new nodes and increase their
"live" expected_votes accordingly.  But they won't write this back to
the config file, leading to lack of information on reboot if they can't
learn better from their peers.

> So what I am thinking is, the first node will add nodelist with nodeid: 1
> into its corosync.conf file.
>
> nodelist {
> node {
>   ring0_addr: node1
>   nodeid: 1
> }
> }
>
> The second node to be added will get this information through some other
> means and add itself with nodeid: 2 into it's corosync file.
> Now the question I have is, does node1 also need to be updated with
> information about node 2?

It'd better, at least to exclude any possibility of clashing nodeids.

> When i tested it locally, the cluster was up even without node1 having
> node2 in its corosync.conf. Node2's corosync had both. If node1 doesn't
> need to be told about node2, is there a way where we don't configure the
> nodes but let them discover each other through the multicast IP (best
> option).

If you use IPv4 multicast and don't specify otherwise, the node IDs are
assigned according to the ring0 addresses (IPv4 addresses are 32 bit
integers after all).  But you still have to update expected_votes.

> Assuming we should add it to keep the files in sync, what's the best way to
> add the node information (either itself or other) preferably through some
> CLI command?

There's no corosync tool to update the config file.  An Augeas lense is
provided for corosync.conf though, which should help with the task (I
myself never tried it).  Then corosync-cfgtool -R makes all daemons in
the cluster reload their config files.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] dovecot RA

2016-06-08 Thread Ken Gaillot
On 06/08/2016 10:11 AM, Dmitri Maziuk wrote:
> On 2016-06-08 09:11, Ken Gaillot wrote:
>> On 06/08/2016 03:26 AM, Jan Pokorný wrote:
> 
>>> Pacemaker can drive systemd-managed services for quite some time.
>>
>> This is as easy as changing lsb:dovecot to systemd:dovecot.
> 
> Great! Any chance that could be mentioned on
> http://www.linux-ha.org/wiki/Resource_agents -- hint, hint ;)
> 
> Thanks guys,
> Dima

There's a big box at the top of every page on that wiki :)

"Looking for current and maintained information and documentation on
(Linux ) Open Source High Availability HA Clustering? You probably
should be reading the Pacemaker site clusterlabs.org. This site
conserves Heartbeat specific stuff."

The current documentation is:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-supported

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] dovecot RA

2016-06-08 Thread Dmitri Maziuk

On 2016-06-08 09:11, Ken Gaillot wrote:

On 06/08/2016 03:26 AM, Jan Pokorný wrote:



Pacemaker can drive systemd-managed services for quite some time.


This is as easy as changing lsb:dovecot to systemd:dovecot.


Great! Any chance that could be mentioned on 
http://www.linux-ha.org/wiki/Resource_agents -- hint, hint ;)


Thanks guys,
Dima


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker_remoted XML parse error

2016-06-08 Thread Ken Gaillot
On 06/08/2016 06:14 AM, Narayanamoorthy Srinivasan wrote:
> I have a pacemaker cluster with two pacemaker remote nodes. Recently the
> remote nodes started throwing below errors and SDB started self-fencing.
> Appreciate if someone throws light on what could be the issue and the fix.
> 
> OS - SLES 12 SP1
> Pacemaker Remote version - pacemaker-remote-1.1.13-14.7.x86_64
> 
> 2016-06-08T14:11:46.009073+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> error : AttValue: ' expected
> 2016-06-08T14:11:46.009314+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:
> key="neutron-ha-tool_monitor_0" operation="monitor"
> crm-debug-origin="do_update_
> 2016-06-08T14:11:46.009443+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:  
>  ^
> 2016-06-08T14:11:46.009567+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> error : attributes construct error
> 2016-06-08T14:11:46.009697+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:
> key="neutron-ha-tool_monitor_0" operation="monitor"
> crm-debug-origin="do_update_
> 2016-06-08T14:11:46.009824+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:  
>  ^
> 2016-06-08T14:11:46.009948+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> error : Couldn't find end of Start Tag lrm_rsc_op line 1
> 2016-06-08T14:11:46.010070+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:
> key="neutron-ha-tool_monitor_0" operation="monitor"
> crm-debug-origin="do_update_
> 2016-06-08T14:11:46.010191+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:  
>  ^
> 2016-06-08T14:11:46.010460+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> error : Premature end of data in tag lrm_resource line 1
> 2016-06-08T14:11:46.010718+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:
> key="neutron-ha-tool_monitor_0" operation="monitor"
> crm-debug-origin="do_update_
> 2016-06-08T14:11:46.010977+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:  
>  ^
> 2016-06-08T14:11:46.011234+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> error : Premature end of data in tag lrm_resources line 1
> 
> 
> -- 
> Thanks & Regards
> Moorthy

This sounds like the network traffic between the cluster nodes and the
remote nodes is being corrupted. Have there been any network changes
lately? Switch/firewall/etc. equipment/settings? MTU?

You could try using a packet sniffer such as wireshark to see if the
traffic looks abnormal in some way. The payload is XML so it should be
more or less readable.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Minimum configuration for dynamically adding a node to a cluster

2016-06-08 Thread Ken Gaillot
On 06/08/2016 06:54 AM, Jehan-Guillaume de Rorthais wrote:
> 
> 
> Le 8 juin 2016 13:36:03 GMT+02:00, Nikhil Utane  
> a écrit :
>> Hi,
>>
>> Would like to know the best and easiest way to add a new node to an
>> already
>> running cluster.
>>
>> Our limitation:
>> 1) pcsd cannot be used since (as per my understanding) it communicates
>> over
>> ssh which is prevented.
> 
> As far as i remember,  pcsd deamons use their own tcp port (not the ssh one) 
> and communicate with each others using http queries (over ssl i suppose).

Correct, pcsd uses port 2224. It encrypts all traffic. If you can get
that allowed through your firewall between cluster nodes, that will be
the easiest way.

corosync.conf does need to be kept the same on all nodes, and corosync
needs to be reloaded after any changes. pcs will handle this
automatically when adding/removing nodes. Alternatively, it is possible
to use corosync.conf with multicast, without explicitly listing
individual nodes.

> As far as i understand, crmsh uses ssh, not pcsd.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] dovecot RA

2016-06-08 Thread Ken Gaillot
On 06/08/2016 03:26 AM, Jan Pokorný wrote:
> On 07/06/16 14:48 -0500, Dimitri Maziuk wrote:
>> next question: I'm on centos 7 and there's no more /etc/init.d/> anything>. With lennartware spreading, is there a coherent plan to deal
>> with former LSB agents?
> 
> Pacemaker can drive systemd-managed services for quite some time.

This is as easy as changing lsb:dovecot to systemd:dovecot.

Or, if you specify it as service:dovecot, Pacemaker will check whether
LSB, systemd or upstart is used on the local system, and call the
appropriate one.

As with LSB, don't enable systemd-managed services to start at boot, if
you want the cluster to manage them.

One issue that sometimes comes up: some scripts (some logrotate conf
files or cron jobs, for example) will call "systemctl reload
". If the service is managed by the cluster, systemd
doesn't think it's running, so the reload will fail. You have to replace
such lines with a native reload mechanism for the service.

> Provided that the project/daemon you care about carries the unit
> file, you can use that unless there are distinguished roles for the
> provided service within the cluster (like primary+replicas), there's
> a need to run multiple varying instances of the same service,
> or other cluster-specific features are desired.
> 
> For dovecot, I can see:
> # rpm -ql dovecot | grep \.service
> /usr/lib/systemd/system/dovecot.service 
> 
>> Specifically, should I roll my own RA for dovecot or is there one in the
>> works somewhere?
> 
> If you miss something with the generic approach per above, and there's
> no fitting open-sourced RA around then it's probably your last resort.
> 
> For instance, there was once an agent written in C (highly unusual),
> but seems abandoned a long time ago:
> https://github.com/perrit/dovecot-ocf-resource-agent

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Minimum configuration for dynamically adding a node to a cluster

2016-06-08 Thread Nikhil Utane
Hi,

Would like to know the best and easiest way to add a new node to an already
running cluster.

Our limitation:
1) pcsd cannot be used since (as per my understanding) it communicates over
ssh which is prevented.
2) No manual editing of corosync.conf

So what I am thinking is, the first node will add nodelist with nodeid: 1
into its corosync.conf file.

nodelist {
node {
  ring0_addr: node1
  nodeid: 1
}
}

The second node to be added will get this information through some other
means and add itself with nodeid: 2 into it's corosync file.
Now the question I have is, does node1 also need to be updated with
information about node 2?
When i tested it locally, the cluster was up even without node1 having
node2 in its corosync.conf. Node2's corosync had both. If node1 doesn't
need to be told about node2, is there a way where we don't configure the
nodes but let them discover each other through the multicast IP (best
option).

Assuming we should add it to keep the files in sync, what's the best way to
add the node information (either itself or other) preferably through some
CLI command?

-Regards
Nikhil
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] pacemaker_remoted XML parse error

2016-06-08 Thread Narayanamoorthy Srinivasan
I have a pacemaker cluster with two pacemaker remote nodes. Recently the
remote nodes started throwing below errors and SDB started self-fencing.
Appreciate if someone throws light on what could be the issue and the fix.

OS - SLES 12 SP1
Pacemaker Remote version - pacemaker-remote-1.1.13-14.7.x86_64

2016-06-08T14:11:46.009073+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser error
: AttValue: ' expected
2016-06-08T14:11:46.009314+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error:
key="neutron-ha-tool_monitor_0" operation="monitor"
crm-debug-origin="do_update_
2016-06-08T14:11:46.009443+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error:
 ^
2016-06-08T14:11:46.009567+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser error
: attributes construct error
2016-06-08T14:11:46.009697+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error:
key="neutron-ha-tool_monitor_0" operation="monitor"
crm-debug-origin="do_update_
2016-06-08T14:11:46.009824+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error:
 ^
2016-06-08T14:11:46.009948+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser error
: Couldn't find end of Start Tag lrm_rsc_op line 1
2016-06-08T14:11:46.010070+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error:
key="neutron-ha-tool_monitor_0" operation="monitor"
crm-debug-origin="do_update_
2016-06-08T14:11:46.010191+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error:
 ^
2016-06-08T14:11:46.010460+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser error
: Premature end of data in tag lrm_resource line 1
2016-06-08T14:11:46.010718+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error:
key="neutron-ha-tool_monitor_0" operation="monitor"
crm-debug-origin="do_update_
2016-06-08T14:11:46.010977+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error:
 ^
2016-06-08T14:11:46.011234+05:30 d18-fb-7b-18-f1-8e
pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser error
: Premature end of data in tag lrm_resources line 1


-- 
Thanks & Regards
Moorthy
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] dovecot RA

2016-06-08 Thread Jan Pokorný
On 07/06/16 14:48 -0500, Dimitri Maziuk wrote:
> next question: I'm on centos 7 and there's no more /etc/init.d/ anything>. With lennartware spreading, is there a coherent plan to deal
> with former LSB agents?

Pacemaker can drive systemd-managed services for quite some time.

Provided that the project/daemon you care about carries the unit
file, you can use that unless there are distinguished roles for the
provided service within the cluster (like primary+replicas), there's
a need to run multiple varying instances of the same service,
or other cluster-specific features are desired.

For dovecot, I can see:
# rpm -ql dovecot | grep \.service
/usr/lib/systemd/system/dovecot.service 

> Specifically, should I roll my own RA for dovecot or is there one in the
> works somewhere?

If you miss something with the generic approach per above, and there's
no fitting open-sourced RA around then it's probably your last resort.

For instance, there was once an agent written in C (highly unusual),
but seems abandoned a long time ago:
https://github.com/perrit/dovecot-ocf-resource-agent

-- 
Jan (Poki)


pgpQc0V6UzOwG.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org