Re: [ClusterLabs] Moving Related Servers

2016-04-18 Thread Ken Gaillot
On 04/18/2016 02:34 AM, ‪H Yavari‬ ‪ wrote:
> Hi,
> 
> I have 4 CentOS servers (App1,App2.App3 and App4). I created a cluster
> for App1 and App2 with a IP float and it works well.
> In our infrastructure App1 works only with App3 and App2 only works with
> App4. I mean we have 2 server sets (App1 and App3) , (App2 and App4).
> So I want when server app1 is down and app2 will Online node, App3 will
> offline too and App4 will Online and vice versa, I mean when App3 is
> down and App4 will Online, App1 will offline too.
> 
> 
> How can I do with pacemaker ? we have our self service on servers so how
> can I user Pacemaker for monitoring these services?
> 
> Thanks for reply.
> 
> Regards.
> H.Yavari

I'm not sure I understand your requirements.

There's no way to tell one node to leave the cluster when another node
is down, and it would be a bad idea if you could: the nodes could never
start up, because each would wait to see the other before starting; and
in your cluster, two nodes shutting down would make the cluster lose
quorum, so the other nodes would refuse to run any resources.

However, it is usually possible to use constraints to enforce any
desired behavior. So even those the node might not leave the cluster,
you could make the cluster not place any resources on that node.

Can you give more information about your resources and what nodes they
are allowed to run on? What makes App1 and App3 dependent on each other?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker apache and umask on CentOS 7

2016-04-20 Thread Ken Gaillot
On 04/20/2016 09:11 AM, fatcha...@gmx.de wrote:
> Hi,
> 
> I´m running a 2-node apache webcluster on a fully patched CentOS 7 
> (pacemaker-1.1.13-10.el7_2.2.x86_64 pcs-0.9.143-15.el7.x86_64).
> Some files which are generated by the apache are created with a umask 137 but 
> I need this files created with a umask of 117.
> To change this I first tried to add a umask 117 to /etc/sysconfig/httpd & 
> rebooted the system. This had no effekt.
> So I found out (after some research) that this is not working under CentOS 7 
> and that this had to be changed via systemd.
> So I created a directory "/etc/systemd/system/httpd.service.d" and put there 
> a "umask.conf"-File with this content: 
> [Service]
> UMask=0117
> 
> Again I rebooted the system but no effekt.
> Is the pacemaker really starting the apache over the systemd ? And how can I 
> solve the problem ?
> 
> Any suggestions are welcome
> 
> Kind regards
> 
> fatcharly

It depends on the resource agent you're using for apache.

If you were using systemd:httpd, I'd expect /etc/sysconfig/httpd or the
httpd.service.d override to work.

Since they don't, I'll guess you're using ocf:heartbeat:apache. In that
case, the file specified by the resource's envfiles parameter (which
defaults to /etc/apache2/envvars) is the right spot. So, you could
configure envfiles=/etc/sysconfig/httpd, or you could keep it default
and add your umask to /etc/apache2/envvars.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Q: Resource balancing opration

2016-04-20 Thread Ken Gaillot
On 04/20/2016 01:17 AM, Ulrich Windl wrote:
> Hi!
> 
> I'm wondering: If you boot a node on a cluster, most resources will go to 
> another node (if possible). Due to stickiness configured, those resources 
> will stay there.
> So I'm wondering whether or how I could cause a rebalance of resources on the 
> cluster. I must admit that I don't understand the details of stickiness 
> related to other parameters. In my understanding stickiness should be related 
> to a percentage of utilization dynamically, so that a resource running on a 
> node that is "almost full" should dynamically lower its stickiness to allow 
> resource migration.
> 
> So if you are going to implement a manual resource rebalance operation, could 
> you dynamically lower the stickiness for each resource (by some amount or 
> some factor), wait if something happens, and then repeat the process until 
> resources look balanced. "Looking balanced" should be no worse as if all 
> resources are started when all cluster nodes are up.
> 
> Spontaneous pros and cons for "resource rebalancing"?
> 
> Regards,
> Ulrich

Pacemaker gives you a few levers to pull. Stickiness and utilization
attributes (with a placement strategy) are the main ones.

Normally, pacemaker *will* continually rebalance according to what nodes
are available. Stickiness tells the cluster not to do that.

Whether you should use stickiness (and how much) depends mainly on how
significant is the interruption that occurs when a service is moved. For
a large database supporting a high-traffic website, stopping and
starting can take a long time and cost a lot of business -- so maybe you
want an infinite stickiness in that case, and only rebalance manually
during a scheduled window. For a small VM that can live-migrate quickly
and doesn't affect any of your customer-facing services, maybe you don't
mind setting a small or zero stickiness.

You can also use rules to make the process intelligent. For example, for
a server that provides office services, you could set a rule that sets
infinite stickiness during business hours, and small or zero stickiness
otherwise. That way, you'd get no disruptions when people are actually
using the service during the day, and at night, it would automatically
rebalance.

Normally, pacemaker's idea of "balancing" is to simply distribute the
number of resources on each node as equally as possible. Utilization
attributes and placement strategies let you add more intelligence. For
example, you can define the number of cores per node or the amount of
RAM per node, along with how much each resource is expected to use, and
let pacemaker balance by that instead of just counting the number of
resources.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Moving Related Servers

2016-04-20 Thread Ken Gaillot
On 04/20/2016 12:44 AM, ‪H Yavari‬ ‪ wrote:
> You got my situation right. But I couldn't find any method to do this?
> 
> I should create one cluster with 4 node or 2 cluster with 2 node ? How I
> restrict the cluster nodes to each other!!?

Your last questions made me think of multi-site clustering using booth.
I think this might be the best solution for you.

You can configure two independent pacemaker clusters of 2 nodes each,
then use booth to ensure that one cluster has the resources at any time.
See:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617279413776

This is usually done with clusters at physically separate locations, but
there's no problem with using it with two clusters in one location.

Alternatively, going along more traditional lines such as what Klaus and
I have mentioned, you could use rules and node attributes to keep the
resources where desired. You could write a custom resource agent that
would set a custom node attribute for the matching node (the start
action should set the attribute to 1, and the stop action should set the
attribute to 0; if the resource was on App 1, you'd set the attribute
for App 3, and if the resource was on App 4, you'd set the attribute for
App 4). Colocate that resource with your floating IP, and use a rule to
locate service X where the custom node attribute is 1. See:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617279376656

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617356537136

> 
> 
> *From:* Klaus Wenninger <kwenn...@redhat.com>
> *To:* users@clusterlabs.org
> *Sent:* Wednesday, 20 April 2016, 9:56:05
> *Subject:* Re: [ClusterLabs] Moving Related Servers
> 
> On 04/19/2016 04:32 PM, Ken Gaillot wrote:
>> On 04/18/2016 10:05 PM, ‪H Yavari‬ ‪ wrote:
>>> Hi,
>>>
>>> This is servers maps:
>>>
>>> App 3-> App 1(Active)
>>>
>>> App 4 -> App 2  (Standby)
>>>
>>>
>>> Now App1 and App2 are in a cluster with IP failover.
>>>
>>> I need when IP failover will run and App2 will be Active node, service
>>> "X" on server App3 will be stop and App 4 will be Active node.
>>> In the other words, App1 works only with App3 and App 2 works with App 4.
>>>
>>> I have a web application on App1 and some services on App 3 (this is
>>> same for App2 and App 4)
>> This is a difficult situation to model. In particular, you could only
>> have a dependency one way -- so if we could get App 3 to fail over if
>> App 1 fails, we couldn't model the other direction (App 1 failing over
>> if App 3 fails). If each is dependent on the other, there's no way to
>> start one first.
>>
>> Is there a technical reason App 3 can work only with App 1?
>>
>> Is it possible for service "X" to stay running on both App 3 and App 4
>> all the time? If so, this becomes easier.
> Just another try to understand what you are aiming for:
> 
> You have a 2-node-cluster at the moment consisting of the nodes
> App1 & App2.
> You configured something like a master/slave-group to realize
> an active/standby scenario.
> 
> To get the servers App3 & App4 into the game we would make
> them additional pacemaker-nodes (App3 & App4).
> You now have a service X that could be running either on App3 or
> App4 (which is easy by e.g. making it dependent on a node attribute)
> and it should be running on App3 when the service-group is active
> (master in pacemaker terms) on App1 and on App4 when the
> service-group is active on App2.
> 
> The standard thing would be to collocate a service with the master-role
> (see all the DRBD examples for instance).
> We would now need a locate-x when master is located-y rule instead
> of collocation.
> I don't know any way to directly specify this.
> One - ugly though - way around I could imagine would be:
> 
> - locate service X1 on App3
> - locate service X2 on App4
> - dummy service Y1 is located App1 and collocated with master-role
> - dummy service Y2 is located App2 and collocated with master-role
> - service X1 depends on Y1
> - service X2 depends on Y2
> 
> If that somehow reflects your situation the key question now would
> probably be if pengine would make the group on App2 master
> if service X1 fails on App3. I would guess yes but I'm not sure.
> 
> Regards,
> Klaus
> 
>>> Sorry for heavy descrip

Re: [ClusterLabs] pacemaker apache and umask on CentOS 7

2016-04-20 Thread Ken Gaillot
On 04/20/2016 12:20 PM, Klaus Wenninger wrote:
> On 04/20/2016 05:35 PM, fatcha...@gmx.de wrote:
>>
>>> Gesendet: Mittwoch, 20. April 2016 um 16:31 Uhr
>>> Von: "Klaus Wenninger" 
>>> An: users@clusterlabs.org
>>> Betreff: Re: [ClusterLabs] pacemaker apache and umask on CentOS 7
>>>
>>> On 04/20/2016 04:11 PM, fatcha...@gmx.de wrote:
 Hi,

 I´m running a 2-node apache webcluster on a fully patched CentOS 7 
 (pacemaker-1.1.13-10.el7_2.2.x86_64 pcs-0.9.143-15.el7.x86_64).
 Some files which are generated by the apache are created with a umask 137 
 but I need this files created with a umask of 117.
 To change this I first tried to add a umask 117 to /etc/sysconfig/httpd & 
 rebooted the system. This had no effekt.
 So I found out (after some research) that this is not working under CentOS 
 7 and that this had to be changed via systemd.
 So I created a directory "/etc/systemd/system/httpd.service.d" and put 
 there a "umask.conf"-File with this content: 
 [Service]
 UMask=0117

 Again I rebooted the system but no effekt.
 Is the pacemaker really starting the apache over the systemd ? And how can 
 I solve the problem ?
>>> Didn't check with CentOS7 but on RHEL7 there is a
>>> /usr/lib/ocf/resource.d/heartbeat/apache.
>>> So it depends on how you defined the resource starting apache if systemd
>>> is used or if it being done by the ocf-ra.
>> MY configuration is:
>> Resource: apache (class=ocf provider=heartbeat type=apache)
>>   Attributes: configfile=/etc/httpd/conf/httpd.conf 
>> statusurl=http://127.0.0.1:8089/server-status
>>   Operations: start interval=0s timeout=40s (apache-start-timeout-40s)
>>   stop interval=0s timeout=60s (apache-stop-timeout-60s)
>>   monitor interval=1min (apache-monitor-interval-1min)
>>
>> So I quess it is ocf. But what will be the right way to do it ? I lack a bit 
>> of understandig about this /usr/lib/ocf/resource.d/heartbeat/apache file.  
>>
> There are the ocf-Resource-Agents (if there is none you can always
> create one for your service) which usually
> give you a little bit more control of the service from the cib. (You can
> set a couple of variables like in this example
> the pointer to the config-file)
> And of course you can always create resources referring the native
> services of your distro (systemd-units in
> this case).
>>
>>
>>
 Any suggestions are welcome

If you add envfiles="/etc/sysconfig/httpd" to your apache resource, it
should work.

 Kind regards

 fatcharly

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Q: Resource balancing opration

2016-04-21 Thread Ken Gaillot
On 04/21/2016 01:56 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 20.04.2016 um 16:44 in 
>>>> Nachricht
>> You can also use rules to make the process intelligent. For example, for
>> a server that provides office services, you could set a rule that sets
>> infinite stickiness during business hours, and small or zero stickiness
>> otherwise. That way, you'd get no disruptions when people are actually
>> using the service during the day, and at night, it would automatically
>> rebalance.
> 
> Could you give a concrete example for this?

Sure, looking at the example in:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_using_rules_to_control_cluster_options


Normally, the CIB's resource defaults section will have a single set of
meta-attributes:

  

 

 

  


If you have more than one, the cluster will use the one with the highest
score (in the example below, always the "core-hours" set with infinite
stickiness):

  

 

 

 

 

  

If you add a rule to a set, that set will only be considered when the
rule is true. So in this final result, we have infinite stickiness
during part of the day and no stickiness the rest of the time:

  

 

  

  


 

 

 

  


Higher-level tools may or may not provide a simpler interface; you may
have to dump, edit and push the XML.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] service flap as nodes join and leave

2016-04-14 Thread Ken Gaillot
On 04/14/2016 09:33 AM, Christopher Harvey wrote:
> MsgBB-Active is a dummy resource that simply returns OCF_SUCCESS on
> every operation and logs to a file.

That's a common mistake, and will confuse the cluster. The cluster
checks the status of resources both where they're supposed to be running
and where they're not. If status always returns success, the cluster
won't try to start it where it should,, and will continuously try to
stop it elsewhere, because it thinks it's already running everywhere.

It's essential that an RA distinguish between running
(OCF_SUCCESS/OCF_RUNNING_MASTER), cleanly not running (OCF_NOT_RUNNING),
and unknown/failed (OCF_ERR_*/OCF_FAILED_MASTER).

See pacemaker's Dummy agent as an example/template:

https://github.com/ClusterLabs/pacemaker/blob/master/extra/resources/Dummy

It touches a temporary file to know whether it is "running" or not.

ocf-shellfuncs has a ha_pseudo_resource() function that does the same
thing. See the ocf:heartbeat:Delay agent for example usage.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Doing reload right

2016-07-14 Thread Ken Gaillot
On 07/13/2016 11:20 PM, Andrew Beekhof wrote:
> On Wed, Jul 6, 2016 at 12:57 AM, Ken Gaillot <kgail...@redhat.com> wrote:
>> On 07/04/2016 02:01 AM, Ulrich Windl wrote:
>>> For the case of changing the contents of an external configuration file, the
>>> RA would have to provide some reloadable dummy parameter then (maybe like
>>> "config_generation=2").
>>
>> That is a widely recommended approach for the current "reload"
>> implementation, but I don't think it's desirable. It still does not
>> distinguish changes in the Pacemaker resource configuration from changes
>> in the service configuration.
>>
>> For example, of an RA has one parameter that is agent-reloadable and
>> another that is service-reloadable, and it gets a "reload" action, it
>> has no way of knowing which of the two (or both) changed. It would have
>> to always reload all agent-reloadable parameters, and trigger a service
>> reload. That seems inefficient to me. Also, only Pacemaker should
>> trigger agent reloads, and only the user should trigger service reloads,
>> so combining them doesn't make sense to me.
> 
> Totally disagree :-)
> 
> The whole reason service reloads exist is that they are more efficient
> than a stop/start cycle.
> 
> So I'm not seeing how calling one, on the rare occasion that the
> parameters change and allow a reload, when it wasn't necessary can be
> classed as inefficient.   On the contrary, trying to avoid it seems
> like over-optimizing when we should be aiming for correctness - ie.
> reloading the whole thing.

I just don't see any logical connection between modifying a service's
Pacemaker configuration and modifying its service configuration file.

Is the idea that people will tend to change them together? I'd expect
that in most environments, the Pacemaker configuration (e.g. where the
apache config file is) will remain much more stable than the service
configuration (e.g. adding/modifying websites).

Service reloads can sometimes be expensive (e.g. a complex/busy postfix
or apache installation) even if they are less expensive than a full restart.

> The most in-efficient part in all this is the current practice of
> updating a dummy attribute to trigger a reload after changing the
> application config file.  That we can address by supporting
> --force-reload for crm_resource like we do for start/stop/monitor (and
> exposing it nicely in pcs).

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker in puppet with cib.xml?

2016-07-21 Thread Ken Gaillot
On 07/21/2016 01:35 PM, Stephano-Shachter, Dylan wrote:
> Hello all,
> 
> I want to put the pacemaker config for my two node cluster in puppet
> but, since it is just one cluster, it seems overkill to use the corosync
> module. If I just have puppet push cib.xml to each machine, will that
> work? To make changes, I would just use pcs to update things and then
> copy cib.xml back to puppet. I am not sure what happens when you change
> cib.xml while the cluster is running. Is it safe?

No, pacemaker checksums the CIB and won't accept a file that isn't
properly signed. Also, the cluster automatically synchronizes changes
made to the CIB across all nodes, so there is no need to push changes
more than once.

Since you're using pcs, the update process could go like this:

  # Get the current configuration:
  pcs cluster cib --config > cib-new.xml

  # Make changes:
  pcs -f cib-new.xml 
  

  # Upload the configuration changes to the cluster:
  pcs cluster cib-push --config cib-new.xml

Using "--config" is important so you only work with the configuration
section of the CIB, and not the dynamically determined cluster
properties and status.

The first and last commands can be done on any one node, with the
cluster running. The "pcs -f" commands can be done anywhere/anytime.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker in puppet with cib.xml?

2016-07-21 Thread Ken Gaillot
On 07/21/2016 02:20 PM, Stephano-Shachter, Dylan wrote:
> I am familiar with pcs cluster cib.
> 
> What I am thinking of doing is running "pcs cluster cib --config >
> config.xml" to get a valid config.
> 
> I will then put config.xml on the puppet server and have it push the
> file and run "pcs cluster cib-push --config config.xml" every hour. 
> 
> Will this cause any problems due to pushing the config multiple times?

No, that's fine.

> This would allow me to make small edits to the file in puppet and have
> it pushed automatically. If I wanted to make any big changes, I can make
> them with pcs and just pull another config.

Sounds good.

> On Thu, Jul 21, 2016 at 2:52 PM, Ken Gaillot <kgail...@redhat.com
> <mailto:kgail...@redhat.com>> wrote:
> 
> On 07/21/2016 01:35 PM, Stephano-Shachter, Dylan wrote:
> > Hello all,
> >
> > I want to put the pacemaker config for my two node cluster in puppet
> > but, since it is just one cluster, it seems overkill to use the corosync
> > module. If I just have puppet push cib.xml to each machine, will that
> > work? To make changes, I would just use pcs to update things and then
> > copy cib.xml back to puppet. I am not sure what happens when you change
> > cib.xml while the cluster is running. Is it safe?
> 
> No, pacemaker checksums the CIB and won't accept a file that isn't
> properly signed. Also, the cluster automatically synchronizes changes
> made to the CIB across all nodes, so there is no need to push changes
> more than once.
> 
> Since you're using pcs, the update process could go like this:
> 
>   # Get the current configuration:
>   pcs cluster cib --config > cib-new.xml
> 
>   # Make changes:
>   pcs -f cib-new.xml 
>   
> 
>   # Upload the configuration changes to the cluster:
>   pcs cluster cib-push --config cib-new.xml
> 
> Using "--config" is important so you only work with the configuration
> section of the CIB, and not the dynamically determined cluster
> properties and status.
> 
> The first and last commands can be done on any one node, with the
> cluster running. The "pcs -f" commands can be done anywhere/anytime.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Previous DC fenced prior to integration

2016-07-29 Thread Ken Gaillot
On 07/28/2016 01:48 PM, Nate Clark wrote:
> On Mon, Jul 25, 2016 at 2:48 PM, Nate Clark <n...@neworld.us> wrote:
>> On Mon, Jul 25, 2016 at 11:20 AM, Ken Gaillot <kgail...@redhat.com> wrote:
>>> On 07/23/2016 10:14 PM, Nate Clark wrote:
>>>> On Sat, Jul 23, 2016 at 1:06 AM, Andrei Borzenkov <arvidj...@gmail.com> 
>>>> wrote:
>>>>> 23.07.2016 01:37, Nate Clark пишет:
>>>>>> Hello,
>>>>>>
>>>>>> I am running pacemaker 1.1.13 with corosync and think I may have
>>>>>> encountered a start up timing issue on a two node cluster. I didn't
>>>>>> notice anything in the changelog for 14 or 15 that looked similar to
>>>>>> this or open bugs.
>>>>>>
>>>>>> The rough out line of what happened:
>>>>>>
>>>>>> Module 1 and 2 running
>>>>>> Module 1 is DC
>>>>>> Module 2 shuts down
>>>>>> Module 1 updates node attributes used by resources
>>>>>> Module 1 shuts down
>>>>>> Module 2 starts up
>>>>>> Module 2 votes itself as DC
>>>>>> Module 1 starts up
>>>>>> Module 2 sees module 1 in corosync and notices it has quorum
>>>>>> Module 2 enters policy engine state.
>>>>>> Module 2 policy engine decides to fence 1
>>>>>> Module 2 then continues and starts resource on itself based upon the old 
>>>>>> state
>>>>>>
>>>>>> For some reason the integration never occurred and module 2 starts to
>>>>>> perform actions based on stale state.
>>>>>>
>>>>>> Here is the full logs
>>>>>> Jul 20 16:29:06.376805 module-2 crmd[21969]:   notice: Connecting to
>>>>>> cluster infrastructure: corosync
>>>>>> Jul 20 16:29:06.386853 module-2 crmd[21969]:   notice: Could not
>>>>>> obtain a node name for corosync nodeid 2
>>>>>> Jul 20 16:29:06.392795 module-2 crmd[21969]:   notice: Defaulting to
>>>>>> uname -n for the local corosync node name
>>>>>> Jul 20 16:29:06.403611 module-2 crmd[21969]:   notice: Quorum lost
>>>>>> Jul 20 16:29:06.409237 module-2 stonith-ng[21965]:   notice: Watching
>>>>>> for stonith topology changes
>>>>>> Jul 20 16:29:06.409474 module-2 stonith-ng[21965]:   notice: Added
>>>>>> 'watchdog' to the device list (1 active devices)
>>>>>> Jul 20 16:29:06.413589 module-2 stonith-ng[21965]:   notice: Relying
>>>>>> on watchdog integration for fencing
>>>>>> Jul 20 16:29:06.416905 module-2 cib[21964]:   notice: Defaulting to
>>>>>> uname -n for the local corosync node name
>>>>>> Jul 20 16:29:06.417044 module-2 crmd[21969]:   notice:
>>>>>> pcmk_quorum_notification: Node module-2[2] - state is now member (was
>>>>>> (null))
>>>>>> Jul 20 16:29:06.421821 module-2 crmd[21969]:   notice: Defaulting to
>>>>>> uname -n for the local corosync node name
>>>>>> Jul 20 16:29:06.422121 module-2 crmd[21969]:   notice: Notifications 
>>>>>> disabled
>>>>>> Jul 20 16:29:06.422149 module-2 crmd[21969]:   notice: Watchdog
>>>>>> enabled but stonith-watchdog-timeout is disabled
>>>>>> Jul 20 16:29:06.422286 module-2 crmd[21969]:   notice: The local CRM
>>>>>> is operational
>>>>>> Jul 20 16:29:06.422312 module-2 crmd[21969]:   notice: State
>>>>>> transition S_STARTING -> S_PENDING [ input=I_PENDING
>>>>>> cause=C_FSA_INTERNAL origin=do_started ]
>>>>>> Jul 20 16:29:07.416871 module-2 stonith-ng[21965]:   notice: Added
>>>>>> 'fence_sbd' to the device list (2 active devices)
>>>>>> Jul 20 16:29:08.418567 module-2 stonith-ng[21965]:   notice: Added
>>>>>> 'ipmi-1' to the device list (3 active devices)
>>>>>> Jul 20 16:29:27.423578 module-2 crmd[21969]:  warning: FSA: Input
>>>>>> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
>>>>>> Jul 20 16:29:27.424298 module-2 crmd[21969]:   notice: State
>>>>>> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
>>>>>> cause=C_TIMER_POPPED origin=election_timeout_popped ]
>>>>>> Jul 20 16:29:27.460834 module-2 crmd

Re: [ClusterLabs] Bloody Newbie needs help for OCFS2 on pacemaker+corosync+pcs

2016-08-02 Thread Ken Gaillot
On 08/02/2016 08:16 AM, t...@it-hluchnik.de wrote:
> Hello Kyle + all,
> 
> No luck at all. Cant get o2cb up at all. Please find details below.
> Thanks in advance for any help.
> 
> First I tried to translate your crm syntax to pcs syntax:
> 
> primitive p_o2cb lsb:o2cb \ op monitor interval="10" timeout="30"
> \ op start interval="0" timeout="120" \ op stop interval="0"
> timeout="120"
> 
> ||| vvv
> 
> # pcs resource create ResO2CB lsb:o2cb \ op monitor interval="10"
> timeout="30" \ op start interval="0" timeout="120" \ op stop
> interval="0" timeout="120"
> 
> Error: Unable to create resource 'lsb:o2cb', it is not installed on
> this system (use --force to override)
> 
> 
> I checked my installation and found this:
> 
> # rpm -ql pacemaker | grep o2cb 
> /usr/share/man/man7/ocf_pacemaker_o2cb.7.gz
> 
> According this, I would expect
> /usr/lib/ocf/resource.d/pacemaker/o2cb but there is no such
> script.

OEL is, shall we say, very similar to RHEL. RHEL doesn't support
OCFS2, so it does not include that RA. It is ironic that OEL doesn't
change that. In any case, you can get the RA from the upstream source:
https://github.com/ClusterLabs/pacemaker/tree/master/extra/resources

> But I succeeded in:
> 
> # pcs resource create --force ResO2CB ocf:pacemaker:o2cb \ op
> monitor interval="10" timeout="30" \ op start interval="0"
> timeout="120" \ op stop interval="0" timeout="120"
> 
> # pcs resource show ... ResO2CB(ocf::pacemaker:o2cb):
> Stopped ...
> 
> Trying to debug-start:
> 
> # pcs resource debug-start ResO2CB Error performing operation:
> Input/output error
> 
> 
> 
> # rpm -qi pacemaker Name: pacemaker Version : 1.1.13 
> Release : 10.el7 Architecture: x86_64 Install Date: Sa 23 Jul
> 2016 15:23:51 CEST Group   : System Environment/Daemons Size
> : 1400509 License : GPLv2+ and LGPLv2+ Signature   :
> RSA/SHA256, Sa 21 Nov 2015 19:24:37 CET, Key ID 72f97b74ec551f03 
> Source RPM  : pacemaker-1.1.13-10.el7.src.rpm Build Date  : Sa 21
> Nov 2015 18:10:40 CET ...
> 
> It seems that o2cb script is missing in that RPM. Or did I miss to
> install any package?
> 
> Best Regards
> 
> Thomas Hluchnik
> 
> 
> 
> 
> 
> Am Tuesday 02 August 2016 12:39:27 schrieb Kyle O'Donnell:
>> er forgot
>> 
>> primitive p_o2cb lsb:o2cb \ op monitor interval="10" timeout="30"
>> \ op start interval="0" timeout="120" \ op stop interval="0"
>> timeout="120"
>> 
>> - Original Message - From: "Kyle O'Donnell"
>>  To: "users"  Sent:
>> Tuesday, August 2, 2016 6:38:11 AM Subject: Re: [ClusterLabs]
>> Bloody Newbie needs help for OCFS2 onpacemaker+corosync+pcs
>> 
>> primitive mysan ocf:heartbeat:Filesystem \ params
>> device="/dev/myocsdevice" directory="/mymount" fstype="ocfs2"
>> options="rw,noatime" \ op monitor timeout="40" interval="20"
>> depth="0" clone cl_ocfs2mgmt p_o2cb \ meta interleave="true" 
>> clone cl_mysan mysan \ meta target-role="Started" order
>> o_myresource_fs inf: cl_mysan myresource
>> 
>> 
>> - Original Message - From: t...@it-hluchnik.de To: "users"
>>  Sent: Tuesday, August 2, 2016 6:31:44 AM 
>> Subject: [ClusterLabs] Bloody Newbie needs help for OCFS2 on
>> pacemaker+corosync+pcs
>> 
>> Hello everybody, I am new to pacemaker (and to this list), trying
>> to understand pacemaker. For this I created three virtual hosts
>> in my VirtualBox plus four shared disks, attached with each of
>> the three nodes.
>> 
>> I installed Oracle Enterprise Linux 7.1, did a "yum update" and
>> got OEL7.2. Then I created four OCFS2 devices, working fine on
>> all of my three nodes. They are started by systemd, using
>> o2cb.service and ocfs2.service and running fine.
>> 
>> Now I have started with learning pacemaker by "Clusters from
>> Scratch" and meanwhile I have a virtual IP and a Webserver, this
>> works fine so far.
>> 
>> Next I want to control my OCFS2 devices by pacemaker, not by
>> systemd. I searched the net and found some howtos, but they rely
>> on crmsh instead of pcs. Most headaches come from DRBD which I
>> don't understand at all. Why the hell does it seem that I need
>> DRBD for running OCFS2?
>> 
>> Is there anybody who can explain me how to get that running
>> (after disabling o2cb.service & ocfs2.service):
>> 
>> - create a resource which manages and controls o2cb stack -
>> create another resource which manages OCFS2 mountpoints - create
>> constraints for the Web Server (all Apache config / content shall
>> be copied to one of the OCFS2 filesystems)
>> 
>> The Web Server shall be dependent from availability of a mounted
>> OCFS2 device. If it stops working, the Web Server must switch to
>> a node where that mount point is OK.
>> 
>> Thanks in advance for any help
>> 
>> Thomas Hluchnik

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: 

Re: [ClusterLabs] Antw: Coming in 1.1.16: versioned resource parameters

2016-08-11 Thread Ken Gaillot
On 08/11/2016 03:35 AM, Klaus Wenninger wrote:
> On 08/11/2016 09:13 AM, Ulrich Windl wrote:
>>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 10.08.2016 um 22:36 in 
>>>>> Nachricht
>> <804dd911-56a6-328c-00a4-43133f59d...@redhat.com>:
>>> Have you ever changed a resource agent in a backward-incompatible way,
>>> and found yourself wishing you could do a rolling upgrade?
>> Hi!
>>
>> It seems you are fighting the consequence, not the cause: Why do such a 
>> thing? Why not make an intermediate RA that complains about the old 
>> parameters being obsolete (BTW: Does the XML metadata have an "obsolete" 
>> attribute for parameters?) while also supporting the new parameters? Then 
>> you would first update your RAs, the the configuration. Everything will 
>> continue to work. Then you can make the next generation of RAs that drop 
>> support of the obsolete parameters.
>>
>> I'm afarid adding more and more features to pacemaker will make it 
>> bloatware, instead of being small, efficient and reliable.
> 
> There are probably a lot of reasons why the feature could be considered
> a good idea.
> 2 coming to my mind are:
> 
> - everything that allows us to reduce complexity in RAs is usually a
> good idea.
>   Having a generic feature that is used in a lot of use-cases with a lot
> of RAs
>   you will get this feature tested well.
>   Everything you implement in an RA will in the worst case just be
> tested by you
>   if it is something custom that is not of generic use. In any case it
> won't be
>   tested as well as a pacemaker-feature.
>   My personal experience that most of the time when something is not
> behaving
>   as expected it is rather a shortcoming of some RA than a pacemaker
> problem -
>   not saying pacemaker is perfect and doesn't have problems - don't get
> me wrong here.
> 
> - when it is about updating RAs you are not maintaining by yourself you
>   usually don't want to touch them. If you want them to already have the
> checking
>   built in then this will result in needing some kind of synchronization
> of the
>   update-cycles of the different RA-sources among each other and with
> your installation.
>>
>> Regards,
>> Ulrich

Bloat is definitely an issue to consider when adding features. I try to
weigh how many users might be interested, how isolated the new code can
be from other code, whether the feature has any performance impact when
not configured, what alternative approaches are available, and how well
it fits with pacemaker's existing design.

In this case, the main thing that reassured me was that the code is
reasonably well isolated and should have no significant effect when the
feature is not used in the configuration, and it fit very well with the
existing rules capability.

Klaus' comments about the limitations of handling it in the RA are a
reasonable argument for handling it within pacemaker.

Certainly, I agree the best approach is to maintain backward
compatibility in RAs, but that's not always under the control of the
cluster administrator.

>>> With a new feature in the current master branch, which will be part of
>>> the next Pacemaker release, you will be able to specify different
>>> resource parameters to be used with different versions of a resource agent.
>>>
>>> Pacemaker already supports using rules to select resource parameters.
>>> You can, for example, use different parameters on different nodes, or at
>>> different times of the day:
>>>
>>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explai
>>>  
>>> ned/index.html#_using_rules_to_control_resource_options
>>>
>>> The new feature allows the special node attribute #ra-version in these
>>> rules (comparable to the built-in #uname for node name). When a resource
>>> has a rule with #ra-version, Pacemaker will evaluate the rule against
>>> the installed version of the resource agent, as defined by
>>>  in the agent's metadata.
>>>
>>> For example, the following XML configuration creates a resource "A", and
>>> passes the options "widget=1 really-old-param=5" if the resource agent
>>> version is 1.0 or older, and the options "widget=1 super-new-param=10"
>>> if the version is newer:
>>>
>>> 
>>>
>>> 
>>> 
>>>  >>  type=version attribute="#ra-version"
>>>  operation="gt" value="1.0"/>
>>> 
>>> 
>>> 
>>

[ClusterLabs] Coming in 1.1.16: versioned resource parameters

2016-08-10 Thread Ken Gaillot
Have you ever changed a resource agent in a backward-incompatible way,
and found yourself wishing you could do a rolling upgrade?

With a new feature in the current master branch, which will be part of
the next Pacemaker release, you will be able to specify different
resource parameters to be used with different versions of a resource agent.

Pacemaker already supports using rules to select resource parameters.
You can, for example, use different parameters on different nodes, or at
different times of the day:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_using_rules_to_control_resource_options

The new feature allows the special node attribute #ra-version in these
rules (comparable to the built-in #uname for node name). When a resource
has a rule with #ra-version, Pacemaker will evaluate the rule against
the installed version of the resource agent, as defined by
 in the agent's metadata.

For example, the following XML configuration creates a resource "A", and
passes the options "widget=1 really-old-param=5" if the resource agent
version is 1.0 or older, and the options "widget=1 super-new-param=10"
if the version is newer:





 






 










Of course, higher-level tools may provide a more convenient interface.

This allows for a rolling upgrade of a resource agent that changed
parameters. Some nodes can have the older version, and others can have
the newer version, and the correct parameters will be used wherever the
resource is placed.

Some considerations before using:

* All nodes must be upgraded to a Pacemaker version supporting this
feature before it can be used.

* The version is re-checked whenever the resource is started. A stop
action is always executed with the same parameters as the previous
start. Therefore, it is still not recommended to upgrade a resource
agent while the resource is active on that node -- each node should be
put into standby when it is upgraded (or if only the resource agent is
being upgraded, ensure the resource is not running on the node).

* The version check requires an extra metadata call when starting the
resource.

* Live (hot) migration is disabled when versioned parameters are in use
(otherwise, half the migration could be performed with one set of
parameters and the other half with another set).

The impact of the last two points can be minimized by using versioned
parameters only while upgrades are being done, and using normal
(unversioned) parameters otherwise.

Special thanks to Igor Tsiglyar and Mikhail Ksenofontov, who created
this feature as part of a student project with EMC under the supervision
of Victoria Cherkalova.
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] changing pacemaker.log location

2016-08-12 Thread Ken Gaillot
On 08/12/2016 10:19 AM, Christopher Harvey wrote:
> I'm surprised I'm having such a hard time figuring this out on my own.
> I'm running pacemaker 1.1.13 and corosync-2.3.4 and want to change the
> location of pacemaker.log.
> 
> By default it is located in /var/log.
> 
> I looked in corosync.c and found the following lines:
> get_config_opt(config, local_handle, KEY_PREFIX "to_logfile",
> _enabled, "on");
> get_config_opt(config, local_handle, KEY_PREFIX "logfile",
> , "/var/log/pacemaker.log");
> in mcp_read_config
> 
> I can't find any other documentation.
> 
> Here is my corosync.conf file.
> 
> totem {
>   version: 2
>   # Need a cluster name for now:
>   #   https://github.com/corosync/corosync/issues/137
>   cluster_name: temp
>   crypto_cipher: aes256
>   crypto_hash: sha512
> 
>   interface {
> ringnumber: 0
> bindnetaddr: 192.168.132.10
> mcastport: 5405
>   }
>   transport: udpu
>   heartbeat_failures_allowed: 3
> }
> 
> nodelist {
>   node {
> ring0_addr: 192.168.132.25
> nodeid: 1
> name: a
>   }
> 
>   node {
> ring0_addr: 192.168.132.21
> nodeid: 2
> name: b
>   }
> 
>   node {
> ring0_addr: 192.168.132.10
> nodeid: 3
> name: c
>   }
> }
> 
> logging {
>   # Log the source file and line where messages are being
>   # generated. When in doubt, leave off. Potentially useful for
>   # debugging.
>   fileline: on
>   # Log to standard error. When in doubt, set to no. Useful when
>   # running in the foreground (when invoking 'corosync -f')
>   to_stderr: no
>   # Log to a log file. When set to 'no', the 'logfile' option
>   # must not be set.
>   to_logfile: yes
>   logfile: /my/new/location/corosync.log

By default, pacemaker will use the same log file as corosync, so this
should be sufficient.

Alternatively, you can explicitly tell Pacemaker what detail log file to
use with the environment variable PCMK_logfile (typically set in a
distro-specific location such as /etc/sysconfig/pacemaker or
/etc/default/pacemaker).

>   # Log to the system log daemon. When in doubt, set to yes.
>   to_syslog: yes
>   # Log debug messages (very verbose). When in doubt, leave off.
>   debug: off
>   # Log messages with time stamps. When in doubt, set to on
>   # (unless you are only logging to syslog, where double
>   # timestamps can be annoying).
>   timestamp: on
>   logger_subsys {
> subsys: QUORUM
> debug: off
>   }
> }
> quorum {
>   provider: corosync_votequorum
>   expected_votes: 3
> }
> 
> Thanks,
> Chris

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Doing reload right

2016-07-20 Thread Ken Gaillot
On 07/20/2016 11:47 AM, Adam Spiers wrote:
> Ken Gaillot <kgail...@redhat.com> wrote:
>> Hello all,
>>
>> I've been meaning to address the implementation of "reload" in Pacemaker
>> for a while now, and I think the next release will be a good time, as it
>> seems to be coming up more frequently.
> 
> [snipped]
> 
> I don't want to comment directly on any of the excellent points which
> have been raised in this thread, but it seems like a good time to make
> a plea for easier reload / restart of individual instances of cloned
> services, one node at a time.  Currently, if nodes are all managed by
> a configuration management system (such as Chef in our case), when the
> system wants to perform a configuration run on that node (e.g. when
> updating a service's configuration file from a template), it is
> necessary to place the entire node in maintenance mode before
> reloading or restarting that service on that node.  It works OK, but
> can result in ugly effects such as the node getting stuck in
> maintenance mode if the chef-client run failed, without any easy way
> to track down the original cause.
> 
> I went through several design iterations before settling on this
> approach, and they are detailed in a lengthy comment here, which may
> help you better understand the challenges we encountered:
> 
>   
> https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61

Wow, that is a lot of hard-earned wisdom. :-)

I don't think the problem is restarting individual clone instances. You
can already restart an individual clone instance, by unmanaging the
resource and disabling any monitors on it, then using crm_resource
--force-* on the desired node.

The problem (for your use case) is that is-managed is cluster-wide for
the given resource. I suspect coming up with a per-node
interface/implementation for is-managed would be difficult.

If we implement --force-reload, there won't be a problem with reloads,
since unmanaging shouldn't be necessary.

FYI, maintenance mode is supported for Pacemaker Remote nodes as of 1.1.13.

> Similar challenges are posed during upgrade of Pacemaker-managed
> OpenStack infrastructure.
> 
> Cheers,
> Adam
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] can't start/stop a drbd resource with pacemaker

2016-07-18 Thread Ken Gaillot
On 07/15/2016 07:08 PM, Lentes, Bernd wrote:
> 
> 
> - Am 15. Jul 2016 um 23:48 schrieb Ken Gaillot kgail...@redhat.com:
> 
>> On 07/15/2016 03:54 PM, Lentes, Bernd wrote:
>>>
>>>
>>> - Am 13. Jul 2016 um 14:25 schrieb Kristoffer Grönlund 
>>> kgronl...@suse.com:
>>>
> 
> 
> 
>>>>
>>>
>>> Hi,
>>>
>>> i found that:
>>>
>>> crm(live)resource# scores
>>>
>>> Current cluster status:
>>> Online: [ sunhb58820 sunhb65277 ]
>>>
>>>  prim_ip_hawk   (ocf::heartbeat:IPaddr):(target-role:Stopped) 
>>> Stopped
>>>  prim_fs_drbd_r0(ocf::heartbeat:Filesystem):
>>> (target-role:Stopped)
>>>  Stopped
>>>  prim_hawk  (lsb:hawk): Stopped
>>>  Master/Slave Set: ms_drbd_r0 [prim_drbd_r0]
>>>  Stopped: [ sunhb58820 sunhb65277 ]
>>>
>>> Allocation scores:
>>> native_color: prim_ip_hawk allocation score on sunhb58820: -INFINITY
>>> native_color: prim_ip_hawk allocation score on sunhb65277: -INFINITY
>>> native_color: prim_fs_drbd_r0 allocation score on sunhb58820: -INFINITY
>>> native_color: prim_fs_drbd_r0 allocation score on sunhb65277: -INFINITY
>>> native_color: prim_hawk allocation score on sunhb58820: -INFINITY
>>> native_color: prim_hawk allocation score on sunhb65277: -INFINITY
>>> clone_color: ms_drbd_r0 allocation score on sunhb58820: 0
>>> clone_color: ms_drbd_r0 allocation score on sunhb65277: 0
>>> clone_color: prim_drbd_r0:0 allocation score on sunhb58820: 0
>>> clone_color: prim_drbd_r0:0 allocation score on sunhb65277: 0
>>> clone_color: prim_drbd_r0:1 allocation score on sunhb58820: 0
>>> clone_color: prim_drbd_r0:1 allocation score on sunhb65277: 0
>>> native_color: prim_drbd_r0:0 allocation score on sunhb58820: -INFINITY
>>> native_color: prim_drbd_r0:0 allocation score on sunhb65277: -INFINITY
>>> native_color: prim_drbd_r0:1 allocation score on sunhb58820: -INFINITY
>>> native_color: prim_drbd_r0:1 allocation score on sunhb65277: -INFINITY
>>> prim_drbd_r0:0 promotion score on none: 0
>>> prim_drbd_r0:1 promotion score on none: 0
>>>
>>> When the score is -INFINITY, the resource can't run on both nodes. Yes ?
>>
>> Correct, a score of -INFINITY for a particular resource on a particular
>> node means that resource can't run there. In this case, the
>> "target-role:Stopped" explains it -- you've explicitly disabled
>> prim_ip_hawk and prim_fs_drbd_r0 in the configuration, and the cluster
>> implements that by setting -INFINITY scores on all nodes.
>>
>>> What means native_color and clone_color ? I read something about different
>>> functions in the allocation ?
>>
>> Right, it's just an internal detail indicating where the score was
>> calculated. The important information is the resource name, node name,
>> and score.
>>
>>> Why are the values different ? Is the score changing depending on the time ?
>>
>> No, it just means different functions contribute to the score. For
>> clones, both the clone as a whole and the individual clone instances
>> have scores. Scores are added together to get a final value.
>>
>>> And why is there a prim_drbd_r0:0 and a prim_drbd_r0:1 ?
>>
>> Those are the individual clone instances. It's possible for individual
>> clone instances to have different scores. For example, you might have a
>> constraint saying that the master role prefers a certain node.
>>
> 
> Is there a way to manipulate the scores ? Is the score -INFINITY because the 
> resource is stopped
> or is the resource stopped because the score is -INFINITY ?

Yes, yes and yes ;-)

You can manipulate the scores in the configuration. The different
high-level tools (crm, pcs, etc.) have their own syntax for specifying
scores, but each constraint has a score, and resource stickiness is a
score. The cluster sums up all the scores that apply to a given resource
per node, and places the resource on the node with the highest score.

From the cluster's point of view, the resource is stopped because the
score is negative on all nodes.

However, the crm command's syntax uses "stop" and "start" indirectly.
"crm stop" (and "pcs resource disable") simply adds constraints with
negative scores to all nodes, to make the cluster stop the resource.
"crm start" (and "pcs resource enable") simply removes those
constraints. So, from the point of view of using crm, "stopping" the
resource creates the negative s

Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-19 Thread Ken Gaillot
On 07/18/2016 05:51 PM, Martin Schlegel wrote:
> Hello all
> 
> I cannot wrap my brain around what's going on here ... any help would prevent 
> me
> from fencing my brain  =:-D
> 
> 
> 
> Problem:
> 
> When completely network isolating a node, i.e. pg1 - sometimes a different 
> node
> gets fenced instead, i.e. pg3 ... in this case I see a syslog message like 
> this
> indicating the wrong stonith device was used:
> stonith-ng[4650]:   notice: Operation 'poweroff' [6216] (call 2 from
> crmd.4654) for host 'pg1' with device 'p_ston_pg3' returned: 0 (OK)
> 
> I had assumed that only the stonith resource p_ston_pg1 had hostname=pg1 and 
> was
> the only resource eligible to be used to fence pg1 !
> 
> Why would it use p_ston_pg3 then ?
> 
> 
> 
> 
> Configuration summary - more details and logs below:
> 
>   * 3x nodes pg1, pg2 and pg3 
>   * 3x stonith resources p_ston_pg1, p_ston_pg2 and p_ston_pg3 - one for each
> node
>   * symmetric-cluster=false (!), please see location constraints 
> l_pgs_resources
> and l_ston_pg1, l_ston_pg2 & l_ston_pg3 further below
>   * We rely on /etc/hosts to resolve pg1, pg2 and pg3 for corosync - the 
> actual
> hostnames are completely different
>   * We rely on the option "hostname" for stonith:external/ipmi to specify the
> name of the host to be managed by the defined STONITH device.
> 
> 
> 
> 
> The stonith registration looks wrong to me (?) - I expected 1 single stonith
> device to be registered per host - see crm_mon output - only 1 p_ston_pgX
> resource gets started per host (!):
> 
> root@test123:~# for node in pg{1..3} ; do ssh $node stonith_admin -L ; done
> Warning: Permanently added 'pg1,10.148.128.28' (ECDSA) to the list of known
> hosts.
> 2 devices found
>  p_ston_pg3
>  p_ston_pg2
> Warning: Permanently added 'pg2,10.148.128.7' (ECDSA) to the list of known
> hosts.
> 2 devices found
>  p_ston_pg3
>  p_ston_pg1
> Warning: Permanently added 'pg3,10.148.128.37' (ECDSA) to the list of known
> hosts.
> 2 devices found
>  p_ston_pg1
>  p_ston_pg2
> 
> 
> 
> 
> ... and for the host pg1 (same as for pg2 or pg3) 2x devices are found to 
> fence
> off pg1 - I would expect only 1 device to show up:
> 
> root@test123:~#for node in pg{1..3} ; do ssh $node stonith_admin -l pg1 ;
> done
> 
> Warning: Permanently added 'pg1,10.148.128.28' (ECDSA) to the list of known
> hosts.
> 2 devices found
>  p_ston_pg3
>  p_ston_pg2
> 
> Warning: Permanently added 'pg2,10.148.128.7' (ECDSA) to the list of known
> hosts.
> 2 devices found
>  p_ston_pg1
>  p_ston_pg3
> 
> Warning: Permanently added 'pg3,10.148.128.37' (ECDSA) to the list of known
> hosts.
> 2 devices found
>  p_ston_pg1
>  p_ston_pg2
> 
> 
> 
> 
> crm_mon monitor output:
> 
> root@test123:~# crm_mon -1
> Last updated: Mon Jul 18 22:45:00 2016  Last change: Mon Jul 18 
> 20:52:14
> 2016 by root via cibadmin on pg2
> Stack: corosync
> Current DC: pg1 (version 1.1.14-70404b0) - partition with quorum
> 3 nodes and 25 resources configured
> 
> Online: [ pg1 pg2 pg3 ]
> 
>  p_ston_pg1 (stonith:external/ipmi):Started pg2
>  p_ston_pg2 (stonith:external/ipmi):Started pg3
>  p_ston_pg3 (stonith:external/ipmi):Started pg1
> 
> 
> 
> 
> 
> 
> Configuration:
> 
> [...]
> 
> primitive p_ston_pg1 stonith:external/ipmi \
>  params hostname=pg1 ipaddr=10.148.128.35 userid=root
> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass"
> passwd_method=file interface=lan priv=OPERATOR
> 
> primitive p_ston_pg2 stonith:external/ipmi \
>  params hostname=pg2 ipaddr=10.148.128.19 userid=root
> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass"
> passwd_method=file interface=lan priv=OPERATOR
> 
> primitive p_ston_pg3 stonith:external/ipmi \
>  params hostname=pg3 ipaddr=10.148.128.59 userid=root
> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass"
> passwd_method=file interface=lan priv=OPERATOR
> 
> location l_pgs_resources { otherstuff p_ston_pg1 p_ston_pg2 p_ston_pg3 }
> resource-discovery=exclusive \
> rule #uname eq pg1 \
> rule #uname eq pg2 \
> rule #uname eq pg3
> 
> location l_ston_pg1 p_ston_pg1 -inf: pg1
> location l_ston_pg2 p_ston_pg2 -inf: pg2
> location l_ston_pg3 p_ston_pg3 -inf: pg3

These constraints prevent each device from running on its intended
target, but they don't limit which nodes each device can fence. For
that, each device needs a pcmk_host_list or pcmk_host_map entry, for
example:

   primitive p_ston_pg1 ... pcmk_host_map=pg1:pg1.ipmi.example.com

Use pcmk_host_list if the fence device needs the node name as known to
the cluster, and pcmk_host_map if you need to translate a node name to
an address the device understands.


> [...]
> 
> property cib-bootstrap-options: \
> have-watchdog=false \
> dc-version=1.1.14-70404b0 \
> cluster-infrastructure=corosync \
> symmetric-cluster=false \
> stonith-enabled=true \
> no-quorum-policy=stop \

Re: [ClusterLabs] pg cluster secondary not syncing after failover

2016-07-15 Thread Ken Gaillot
On 07/15/2016 08:58 AM, Peter Brunnengräber wrote:
> Hello all,
>   My apologies for cross-posting this from the postgresql admins list.  I am 
> beginning to think this may have more to do with the postgresql cluster 
> script.
> 
>   I'm having an issue with a postgresql 9.2 cluster after failover and hope 
> someone might be able to help me out.  I have been attempting to follow the 
> guide provided at ClusterLabs(1) page for after a failover, but not having 
> much luck and I don't quite understand where the issue is.  I'm running on 
> debian wheezy.
> 
>   I have my crm_mon output below.  One server is PRI and operating normally 
> after taking over.  I have pg setup to do the wal archiving via rsync to the 
> opposite node.   test-node2:/db/data/postgresql/9.2/pg_archive/%f'>  The rsync is working and 
> I do see WAL files going to the other host appropriately.
> 
>   Node2 was the PRI... So after node1 that was previously in HA:sync promoted 
> last night to PRI and node2 is stopped.  The WAL files are arriving from 
> node1 on node2.  I cleaned-up the /tmp/PGSQL.lock file and proceed with a 
> pg_basebackup restore from node1.  This all went well without error in the 
> node1 postgresql log.
> 
>   After running a crm cleanup on the msPostgresql resource, node2 keeps 
> showing 'LATEST' but gets hung up at HS:alone.  Plus I don't understand why 
> the xlog-loc of node2 shows 001EB9053DD8 which is farther ahead of 
> node1's master-baseline of 001EB280.  I saw the 'cannot stat ... 
> 0001001E00BB' error, but that seems to always happen for the 
> current xlog filename.  Manually copying the missing WAL file from the PRI 
> does not help.
> 
>   And if I wasn't confused enough, the pg log on node2 says "streaming 
> replication successfully connected to primary" and the pg_stat_replication 
> query on node1 shows connected, but ASYNC.
> 
> 
> Any ideas?

Hopefully, someone with pgsql experience can comment -- I can only give
some general pointers.

The cluster software versions in wheezy are considered quite old at this
point, though I'm not aware of anything in particular that would affect
this scenario.

Pacemaker was dropped from jessie due to an unfortunate missed deadline,
but the Debian HA team has gotten recent versions of everything working
and deployed to jessie-backports (as well as stretch and sid), so it is
easy to get a cluster going on jessie now.

You might also compare your installed pgsql resource agent against the
latest upstream to see if any changes might be relevant:

https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql


> 
> Very much appreciated!
> -With kind regards,
>  Peter Brunnengräber
> 
> 
> 
> References:
> (1) http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster#after_fail-over
> 
> 
> ###
> 
> Last updated: Wed Jul 13 14:51:53 2016
> Last change: Wed Jul 13 14:49:17 2016 via crmd on test-node2
> Stack: openais
> Current DC: test-node1 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> 
> 
> Online: [ test-node1 test-node2 ]
> 
> Full list of resources:
> 
>  Resource Group: g_master
>  ClusterIP-Net1 (ocf::heartbeat:IPaddr2):   Started test-node1
>  ReplicationIP-Net2 (ocf::heartbeat:IPaddr2):   Started test-node1
>  Master/Slave Set: msPostgresql [pgsql]
>  Masters: [ test-node1 ]
>  Slaves: [ test-node2 ]
> 
> Node Attributes:
> * Node test-node1:
> + master-pgsql:0: 1000
> + master-pgsql:1: 1000
> + pgsql-data-status : LATEST
> + pgsql-master-baseline : 001EB280
> + pgsql-status  : PRI
> * Node test-node2:
> + master-pgsql:0: -INFINITY
> + master-pgsql:1: -INFINITY
> + pgsql-data-status : LATEST
> + pgsql-status  : HS:alone
> + pgsql-xlog-loc: 001EB9053DD8
> 
> Migration summary:
> * Node test-node2:
> * Node test-node1:
> 
> 
>  Node2
> 2016-07-13 14:55:09 UTC LOG:  database system was interrupted; last known up 
> at 2016-07-13 14:54:27 UTC
> 2016-07-13 14:55:09 UTC LOG:  creating missing WAL directory 
> "pg_xlog/archive_status"
> cp: cannot stat `/db/data/postgresql/9.2/pg_archive/0002.history': No 
> such file or directory
> 2016-07-13 14:55:09 UTC LOG:  entering standby mode
> 2016-07-13 14:55:09 UTC LOG:  restored log file "0001001E00BA" 
> from archive
> 2016-07-13 14:55:09 UTC FATAL:  the database system is starting up
> 2016-07-13 14:55:09 UTC LOG:  redo starts at 1E/BA20
> 2016-07-13 14:55:09 UTC LOG:  consistent recovery state reached at 1E/BA05FED8
> 2016-07-13 14:55:09 UTC LOG:  database system is ready to accept read only 
> connections
> cp: cannot stat 
> 

Re: [ClusterLabs] Clusvcadm -Z substitute in Pacemaker

2016-07-13 Thread Ken Gaillot
On 07/13/2016 05:50 AM, emmanuel segura wrote:
> using pcs resource unmanage leave the monitoring resource actived, I
> usually set the monitor interval=0 :)

Yep :)

An easier way is to set "enabled=false" on the monitor, so you don't
have to remember what your interval was later. You can set it in the
op_defaults section to disable all operations at once (assuming no
operation has "enabled=true" explicitly set).

Similarly, you can set is_managed=false in rsc_defaults to unmanage all
resources (that don't have "is_managed=true" explicitly set).

> 2016-07-11 10:43 GMT+02:00 Tomas Jelinek :
>> Dne 9.7.2016 v 06:39 jaspal singla napsal(a):
>>>
>>> Hello Everyone,
>>>
>>> I need little help, if anyone can give some pointers, it would help me a
>>> lot.
>>>
>>> In RHEL-7.x:
>>>
>>> There is concept of pacemaker and when I use the below command to freeze
>>> my resource group operation, it actually stops all of the resources
>>> associated under the resource group.
>>>
>>> # pcs cluster standby 
>>>
>>> # pcs cluster unstandby 
>>>
>>> Result:  This actually stops all of the resource group in that node
>>> (ctm_service is one of the resource group, which gets stop including
>>> database as well, it goes to MOUNT mode)
>>
>>
>> Hello Jaspal,
>>
>> that's what it's supposed to do. Putting a node into standby means the node
>> cannot host any resources.
>>
>>>
>>> However; through clusvcadm command on RHEL-6.x, it doesn't stop the
>>> ctm_service there and my database is in RW mode.
>>>
>>> # clusvcadm -Z ctm_service
>>>
>>> # clusvcadm -U ctm_service
>>>
>>> So my concern here is - Freezing/unfreezing should not affect the status
>>> of the group. Is there any way around to achieve the same in RHEL-7.x as
>>> well, that was done with clusvcadm on RHEL 6?
>>
>>
>> Maybe you are looking for
>> # pcs resource unmanage 
>> and
>> # pcs resource manage 
>>
>> Regards,
>> Tomas
>>
>>>
>>> Thanks
>>>
>>> Jaspal

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Antw: RES: Pacemaker and OCFS2 on stand alone mode

2016-07-13 Thread Ken Gaillot
On 07/13/2016 03:10 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 12.07.2016 um 21:19 in 
>>>> Nachricht
> <578542bf.9010...@redhat.com>:
>> On 07/12/2016 01:16 AM, Ulrich Windl wrote:
> 
> [...]
>>> What I mean is: there is no "success status" for STONITH; it is assumed that
>>> the node will be down after issuing a successful stonith command. You are
>>> claiming your stonith command was not logging any error, so the cluster will
>>> assume STONITH was successful after a timeout.
>>
>> Fence agents do return success/failure; the cluster considers a timeout
>> to be a failure. The only time the cluster assumes a successful fence is
>> when sbd-based watchdog is in use.
> 
> Hi!
> 
> Sorry, but I don't see the difference: If SBD delivers a command 
> successfully, there is no guarantee that the victim node actually executes 
> the command and resets.
> If you use any other fencing command (like submitting some command to an 
> external device) the situation is not different: Successfully submitting the 
> command does not mean the STONITH will succeed in every case (you could even 
> tun off power in the wrong PDU, which is still a "success" from the cluster's 
> perspective)
> [...]
> 
> What I really wanted to say is:
> If the fencing command logged an error, try to fix it; if it did not, try to 
> find out why fencing did not work.
> 
> Regards,
> Ulrich

Yes, I understand your point now, and agree completely.

The cluster can only respond to the status code (or timeout) it receives
from the fence agent. There may be problems beyond that point (in the
fence agent and/or the device itself) that result in success being
returned incorrectly, and that must be investigated separately.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Preventing pacemaker from attempting to start a VirtualDomain resource on a pacemaker-remote guest node

2016-07-12 Thread Ken Gaillot
On 07/12/2016 12:46 PM, Scott Loveland wrote:
> What is the most efficient way to prevent pacemaker from attempting to
> start a VirtualDomain resource on pacemaker-remote guest nodes?
> 
> I’m running pacemaker 1.1.13 in a KVM host cluster with a large number
> of VirtualDomain (VD) resources (which come and go via automation as
> users add/delete them), a subset of which are also running the
> pacemaker-remote service and acting as guest nodes. The number of KVM
> host nodes in the cluster can vary over time, and the VD resources can
> run on any KVM host node in the cluster. Explicitly defining a set of
> location constraints for each VD specifying only the KVM host nodes
> would be unwieldy, and the constraints would need to change for every VD
> whenever the number of KVM host nodes in the cluster changes. So I would
> prefer to run this as a symmetric cluster in which all VDs can
> implicitly run on all KVM host nodes, but somehow tell the VD’s they
> should not try to start on the pacemaker-remote guest nodes (where they
> will just fail). I’m just not sure the most efficient way to accomplish
> this.
> 
> The approach I’ve hit on so far is to explicitly define an instance
> attribute on each pacemaker-remote guest node which labels it as such,
> and then define a location constraint rule for all VDs that tells them
> to avoid all such guest nodes.
> 
> Specifically, I issue a command such as this for each pacemaker-remote
> guest node after its corresponding VD is defined (in this example, for a
> guest node named “GuestNode1”):
> 
> # crm_attribute --node GuestNode1 --name type --update remote
> 
> And then for each VD (in this example, for the VD named “VM2”):
> 
> # pcs constraint location VM2 rule score=-INFINITY type eq remote
> 
> These commands have nothing unique in them other than the guest node or
> VD name, so they are easy to add to our automation that provisions the
> actual virtual machines, and do not require revision when KVM host nodes
> are added to the cluster.
> 
> Is that the generally recommended approach, or is there a more efficient
> way of accomplishing the same thing?
> 
> PS: For an asymmetric cluster, a similar approach would work as well,
> such as:
> 
> # crm_attribute --node KVMHost1 --name type --update normal
> 
> # pcs constraint location VM2 rule score=100 type eq normal
> 
> 
> 
> - Scott

That is exactly the approach I'd recommend -- but there is a shortcut:
there is a special, undocumented node attribute called "#kind" that is
either "cluster" (normal node), "remote" (ocf:pacemaker:remote node) or
"container" (guest node, unfortunately using old terminology that has
nothing to do with Docker-style containers). So you can use that in
place of creating your own. It also has the benefit that it will exist
as soon as the node exists.

There's no reason it's undocumented, other than features get added more
often than documentation gets updated.

FYI, the cluster will automatically ensure that stonith resources and
guest node resources do not run on remote/guest nodes.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Clusvcadm -Z substitute in Pacemaker

2016-07-13 Thread Ken Gaillot
On 07/13/2016 09:56 AM, emmanuel segura wrote:
> enabled=false works with every pacemaker versions?

It was introduced in Pacemaker 1.0.2, so realistically, yes :)

> 2016-07-13 16:48 GMT+02:00 Ken Gaillot <kgail...@redhat.com>:
>> On 07/13/2016 05:50 AM, emmanuel segura wrote:
>>> using pcs resource unmanage leave the monitoring resource actived, I
>>> usually set the monitor interval=0 :)
>>
>> Yep :)
>>
>> An easier way is to set "enabled=false" on the monitor, so you don't
>> have to remember what your interval was later. You can set it in the
>> op_defaults section to disable all operations at once (assuming no
>> operation has "enabled=true" explicitly set).
>>
>> Similarly, you can set is_managed=false in rsc_defaults to unmanage all
>> resources (that don't have "is_managed=true" explicitly set).
>>
>>> 2016-07-11 10:43 GMT+02:00 Tomas Jelinek <tojel...@redhat.com>:
>>>> Dne 9.7.2016 v 06:39 jaspal singla napsal(a):
>>>>>
>>>>> Hello Everyone,
>>>>>
>>>>> I need little help, if anyone can give some pointers, it would help me a
>>>>> lot.
>>>>>
>>>>> In RHEL-7.x:
>>>>>
>>>>> There is concept of pacemaker and when I use the below command to freeze
>>>>> my resource group operation, it actually stops all of the resources
>>>>> associated under the resource group.
>>>>>
>>>>> # pcs cluster standby 
>>>>>
>>>>> # pcs cluster unstandby 
>>>>>
>>>>> Result:  This actually stops all of the resource group in that node
>>>>> (ctm_service is one of the resource group, which gets stop including
>>>>> database as well, it goes to MOUNT mode)
>>>>
>>>>
>>>> Hello Jaspal,
>>>>
>>>> that's what it's supposed to do. Putting a node into standby means the node
>>>> cannot host any resources.
>>>>
>>>>>
>>>>> However; through clusvcadm command on RHEL-6.x, it doesn't stop the
>>>>> ctm_service there and my database is in RW mode.
>>>>>
>>>>> # clusvcadm -Z ctm_service
>>>>>
>>>>> # clusvcadm -U ctm_service
>>>>>
>>>>> So my concern here is - Freezing/unfreezing should not affect the status
>>>>> of the group. Is there any way around to achieve the same in RHEL-7.x as
>>>>> well, that was done with clusvcadm on RHEL 6?
>>>>
>>>>
>>>> Maybe you are looking for
>>>> # pcs resource unmanage 
>>>> and
>>>> # pcs resource manage 
>>>>
>>>> Regards,
>>>> Tomas
>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Jaspal
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Default Behavior

2016-06-28 Thread Ken Gaillot
On 06/28/2016 10:53 AM, Pavlov, Vladimir wrote:
> Hello!
> 
> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7),
> with resources IPaddr2 and ldirectord.
> 
> Cluster Properties:
> 
> cluster-infrastructure: cman
> 
> dc-version: 1.1.11-97629de
> 
> no-quorum-policy: ignore
> 
> stonith-enabled: false
> 
> The cluster has been configured for this documentation:
> http://clusterlabs.org/quickstart-redhat-6.html
> 
> Recently, there was a communication failure between cluster nodes and
> the behavior was like this:
> 
> -During a network failure, each server has become the Master.
> 
> -After the restoration of the network, one node killing services
> of Pacemaker on the second node.
> 
> -The second node was not available for the cluster, but all
> resources remain active (Ldirectord,ipvs,ip address). That is, both
> nodes continue to be active.
> 
> We decided to create a test stand and play the situation, but with
> current version of Pacemaker in CentOS repos, сluster behaves differently:
> 
> -During a network failure, each server has become the Master.
> 
> -After the restoration of the network, all resources are stopped.
> 
> -Then the resources are run only on one node. - This behavior
> seems to be more logical.
> 
> Current Cluster Properties on test stand:
> 
> cluster-infrastructure: cman
> 
> dc-version: 1.1.14-8.el6-70404b0
> 
> have-watchdog: false
> 
> no-quorum-policy: ignore
> 
> stonith-enabled: false
> 
> Changed the behavior of the cluster in the new version or accident is
> not fully emulated?

If I understand your description correctly, the situation was not
identical. The difference I see is that, in the original case, the
second node is not responding to the cluster even after the network is
restored. Thus, the cluster cannot communicate to carry out the behavior
observed in the test situation.

Fencing (stonith) is the cluster's only recovery mechanism in such a
case. When the network splits, or a node becomes unresponsive, it can
only safely recover resources if it can ensure the other node is powered
off. Pacemaker supports both physical fencing devices such as an
intelligent power switch, and hardware watchdog devices for self-fencing
using sbd.

> Thank you.
> 
>  
> 
>  
> 
> Kind regards,
> 
>  
> 
> *Vladimir Pavlov*

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How about making pacemaker OCF_ROOT_DIR more portable ?

2016-07-05 Thread Ken Gaillot
On 07/04/2016 10:07 PM, Li Junliang wrote:
> Hi all,
>   Currently, OCF_ROOT_DIR in configure.ac for pacemaker is fixed
> to /usr/lib/ocf. But in resource-agents project , we can set
> OCF_ROOT_DIR by setting "--with-ocf-root=xxx" . So , there may be two
> different OCF directories . Would it be better that we add configure
> argument like --with-ocf-root setting ?
> 
> Best wishes,

Yes, I agree.

Historically, Pacemaker has not provided such an option because the
exact path is specified in the OCF standard. So, compiling Pacemaker
with a different value would make it incompatible with standard OCF agents.

However, I do think it's worth giving the user the option to do so, as
long as they're aware of the implications.

There is one way to change it currently. The configure script looks for
the presence of glue_config.h or hb_config.h from the cluster-glue
package. If that exists, it will look there for the value of
OCF_ROOT_DIR (normally "/usr/lib/ocf") and OCF_RA_DIR (normally
"$OCF_ROOT_DIR/resource.d). It would be a bit hacky, but if you don't
use cluster-glue, you could create a header file by that name with the
desired values.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] DC of the election will be an infinite loop.

2016-07-07 Thread Ken Gaillot
On 06/27/2016 11:41 PM, 飯田 雄介 wrote:
> Hi, all
> 
> I added two lines to comment on cib.xml using crmsh.
> 
> ===
> # cat test.crm
> node 167772452: test-3
> # comment line 1
> # comment line 2
> property cib-bootstrap-options: \
> have-watchdog=false \
> dc-version=1.1.14-1.el7-70404b0 \
> cluster-infrastructure=corosync
> ===
> 
> And was re-start the cluster, the election of the DC was to an infinite loop.
> 
> Environment
> We are using the Pacemaker-1.1.14.
> 
> Where I tried to look at the log,
> It looks like the handling of the comment node is not good.
> It looks like the problem occurs when there are two or more lines of comment.
> 
> Problem seems to occur on or after modification of the following.
> https://github.com/ClusterLabs/pacemaker/commit/1073786ec24f3bbf26a0f6a5b0614a65edac4301
> 
> Does this behavior is a known bug?
> 
> I will attach a crm_report of when the problem occurred.
> 
> Regards,
> Yusuke

This was a tricky one!

In my testing, I found that the behavior occurs only when there is more
than one comment at the same level within a CIB XML element, the
partition has only one node, and the partition does not yet have a DC.

The behavior was indeed introduced in 1.1.14 with the commit you
mentioned. However, there is nothing wrong with that commit! Rather, it
exercised the code differently, triggering other, pre-existing bugs.

I've prepared some fixes which will be merged into the master branch.
Anyone interested in the details can see
https://github.com/ClusterLabs/pacemaker/pull/1099

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] system or ocf resource agent

2016-07-08 Thread Ken Gaillot
On 07/08/2016 05:10 AM, Heiko Reimer wrote:
> Hi,
> 
> i am setting up new debian 8 ha cluster with drbd, corosync and
> pacemaker with apache and mysql. In my old environment i had configured
> resources with ocf resource agents. Now i have seen that there is
> systemd. Which agent would you recommend?
> 
> 
> Mit freundlichen Grüßen / Best regards
>  
> Heiko Reimer


There's no one answer for all services. Some things to consider:

* Only OCF agents can take parameters, so they may have additional
capabilities that the systemd agent doesn't.

* Only OCF agents can be used for globally unique clones, clones that
require notifications, and multistate (master/slave) resources.

* OCF agents are usually written to support a variety of
OSes/distributions, while systemd unit files are often tailored to the
particular system. This can give OCF an advantage if you have a variety
of OSes and want the resource agent to behave as identically as possible
on all of them, or it can give systemd an advantage if you have a
homogeneous environment and want to use OS-specific facilities as much
as possible.

* A lot depends on the particular service. Is the OCF agent widely used
and actively developed? If so, it is more likely to have better features
and enhanced support for running in a cluster; if not, the systemd unit
may be more up-to-date with recent changes in the underlying service.

* Your Pacemaker version matters. Pacemaker added systemd resource
support in 1.1.8, but there were significant issues until 1.1.13, and
minor but useful fixes since then.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Proposed change for 1.1.16: ending python 2.6 compatibility

2016-07-06 Thread Ken Gaillot
On 07/05/2016 06:49 PM, Digimer wrote:
> On 05/07/16 01:31 PM, Ken Gaillot wrote:
>> As you may be aware, python 3 is a significant, backward-compatible
>> restructuring of the python language. Most development of the python 2
>> series has ended, and support for python 2 will completely end in 2020.
>>
>> Pacemaker currently uses python only in its test suites. At some point,
>> we may convert a few existing Pacemaker-provided resource agents and
>> fence agents to python as well.
>>
>> Currently, Pacemaker's python code is compatible with python 2.6 and
>> 2.7. We definitely need to start moving toward python 3 compatibility.
>> It is possible to support both 2.7 and 3 with the same code, but we need
>> to lose compatibility with 2.6. (Maintaining a separate branch of code
>> for 2.6 would not be worth the effort.)
>>
>> So, I propose that Pacemaker stop supporting python 2.6 as of the next
>> version (1.1.16, expected sometime in the fall or winter). Not all our
>> python code is likely to be python3-compatible by that time, but we can
>> start moving in that direction.
>>
>> If anyone has any reasons to do this differently, let me know. As this
>> only affects the test suites currently, and most Linux distributions
>> have stopped supporting python 2.6 already, I expect more "what took you
>> so long" responses than "slow down" ;-)
> 
> If this can be done without breaking RHEL 6 deployments (and it sounds
> like it wouldn't be an issue), then I say go for it. A key to good HA is
> simplicity, and maintaining two branches (or getting stuck on a dead-end
> branch) seems to go against that ethos.

It would not break stock RHEL 6 deployments, but it would introduce an
upstream break.

RHEL 6 recently entered its "Production 2" phase [1], meaning it will
get only bugfixes and not new features. So, its stock packages won't be
affected by changes we make upstream (aside from bugfixes that may be
backported).

With the proposed change, you couldn't build upstream 1.1.16 or later on
RHEL 6 and get 100% functionality (at least, not without also building
python 2.7). At this point, the only thing that would be affected would
be portions of the test suite.

[1] https://access.redhat.com/support/policy/updates/errata

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Proposed change for 1.1.16: ending python 2.6 compatibility

2016-07-06 Thread Ken Gaillot
On 07/06/2016 01:07 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 05.07.2016 um 19:31 in 
>>>> Nachricht
> <577beef9.4000...@redhat.com>:
>> As you may be aware, python 3 is a significant, backward-compatible
>> restructuring of the python language. Most development of the python 2
>> series has ended, and support for python 2 will completely end in 2020.
>>
>> Pacemaker currently uses python only in its test suites. At some point,
>> we may convert a few existing Pacemaker-provided resource agents and
>> fence agents to python as well.
>>
>> Currently, Pacemaker's python code is compatible with python 2.6 and
>> 2.7. We definitely need to start moving toward python 3 compatibility.
>> It is possible to support both 2.7 and 3 with the same code, but we need
>> to lose compatibility with 2.6. (Maintaining a separate branch of code
>> for 2.6 would not be worth the effort.)
>>
>> So, I propose that Pacemaker stop supporting python 2.6 as of the next
>> version (1.1.16, expected sometime in the fall or winter). Not all our
>> python code is likely to be python3-compatible by that time, but we can
>> start moving in that direction.
> 
> I just checked three servers:
> SLES10 SP4 uses python 2.4, but no pacemaker
> SLES11 SP4 uses python 2.6
> SLES12 SP1 uses python 2.7
> 
> As SP4 is the last SP for SLES11, I guess there won't be any update for 
> pacemaker (not to talk about python)
> In SLES12 there is a python3 available, but not installed by default.

SLES11 would be in the same boat as RHEL6 -- the stock packages would
continue to work, but upstream builds would not be 100% functional (only
a portion of the test suite would be affected, currently).

>> If anyone has any reasons to do this differently, let me know. As this
>> only affects the test suites currently, and most Linux distributions
>> have stopped supporting python 2.6 already, I expect more "what took you
>> so long" responses than "slow down" ;-)
> 
> Maybe push python 3 first ;-)

Even with the proposed change, we'd maintain compatibility with python
2.7, almost certainly until the 2020 support cutoff, or even later.

There are some convenient tools for writing python code compatible with
both 2.7 and 3, but not 2.6, which is why we have to drop 2.6
compatibility to add 3 compatibility.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Doing reload right

2016-07-08 Thread Ken Gaillot
On 07/04/2016 07:13 AM, Ferenc Wágner wrote:
> Ken Gaillot <kgail...@redhat.com> writes:
> 
>> Does anyone know of an RA that uses reload correctly?
> 
> My resource agents advertise a no-op reload action for handling their
> "private" meta attributes.  Meta in the sense that they are used by the
> resource agent when performing certain operations, not by the managed
> resource itself.  Which means they are trivially changeable online,
> without any resource operation whatsoever.
> 
>> Does anyone object to the (backward-incompatible) solution proposed
>> here?
> 
> I'm all for cleanups, but please keep an online migration path around.

Not sure what you mean by online ... the behavior would change when
Pacemaker was upgraded, so the node would already be out of the cluster
at that point. You would unmanage resources if desired, stop pacemaker
on the node, upgrade pacemaker, upgrade the RA, then start/manage again.

If you mean that you would like to use the same RA before and after the
upgrade, that would be doable. We could bump the crm feature set, which
gets passed to the RA as an environment variable. You could modify the
RA to handle both reload and reload-params, and if it's asked to reload,
check the feature set to decide which type of reload to do. You could
upgrade the RA anytime before the pacemaker upgrade.

In pseudo-code, the recommended way of supporting reload would become:

  reload_params() { ... }
  reload_service() { ... }

  if action is "reload-params" then
 reload_params()
  else if action is "reload"
 if crm_feature_set < X.Y.Z then
reload_params()
 else
reload_service()


Handling both "unique" and "reloadable" would be more complicated, but
that's inherent in the mismash of meaning unique has right now. I see
three approaches:

1. Use "unique" in its GUI sense and "reloadable" to indicate reloadable
parameters. This would be cleanest, but would not be useful with
pre-"reloadable" pacemaker.

2. Use both unique=0 and reloadable=1 to indicate reloadable parameters.
This sacrifices proper GUI hinting to keep compatibility with pre- and
post-"reloadable" pacemaker (the same sacrifice that has to be made now
to use reload correctly).

3. Dynamically modify the metadata according to the crm feature set,
using approach 1 with post-"reloadable" pacemaker and approach 2 with
pre-"reloadable" pacemaker. This is the most flexible but makes the code
more complicated. In pseudocode, it might look something like:

   if crm_feature_set < X.Y.Z then
  UNIQUE_TRUE=""
  UNIQUE_FALSE=""
  RELOADABLE_TRUE="unique=0"
  RELOADABLE_FALSE="unique=1"
   else
  UNIQUE_TRUE="unique=1"
  UNIQUE_FALSE="unique=0"
  RELOADABLE_TRUE="reloadable=1"
  RELOADABLE_FALSE="reloadable=0"

   meta_data() {
  ...
  
  ...
  
   }

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Doing reload right

2016-07-05 Thread Ken Gaillot
On 07/04/2016 03:52 AM, Vladislav Bogdanov wrote:
> 01.07.2016 18:26, Ken Gaillot wrote:
> 
> [...]
> 
>> You're right, "parameters" or "params" would be more consistent with
>> existing usage. "Instance attributes" is probably the most technically
>> correct term. I'll vote for "reload-params"
> 
> May be "reconfigure" fits better? This would at least introduce an
> action name which does not intersect with LSB/systemd/etc.
> 
> "reload" is for service itself as admin would expect, "reconfigure" is
> for its controlling resource.
> 
> [...]

I like "reconfigure", but then the new parameter attribute should be
"reconfigurable", which could be confusing. "All my parameters are
reconfigurable!" I suppose we could call the attribute
"change_triggers_reconfigure".

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Doing reload right

2016-07-05 Thread Ken Gaillot
On 07/04/2016 02:01 AM, Ulrich Windl wrote:
> For the case of changing the contents of an external configuration file, the
> RA would have to provide some reloadable dummy parameter then (maybe like
> "config_generation=2").

That is a widely recommended approach for the current "reload"
implementation, but I don't think it's desirable. It still does not
distinguish changes in the Pacemaker resource configuration from changes
in the service configuration.

For example, of an RA has one parameter that is agent-reloadable and
another that is service-reloadable, and it gets a "reload" action, it
has no way of knowing which of the two (or both) changed. It would have
to always reload all agent-reloadable parameters, and trigger a service
reload. That seems inefficient to me. Also, only Pacemaker should
trigger agent reloads, and only the user should trigger service reloads,
so combining them doesn't make sense to me.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Proposed change for 1.1.16: ending python 2.6 compatibility

2016-07-05 Thread Ken Gaillot
As you may be aware, python 3 is a significant, backward-compatible
restructuring of the python language. Most development of the python 2
series has ended, and support for python 2 will completely end in 2020.

Pacemaker currently uses python only in its test suites. At some point,
we may convert a few existing Pacemaker-provided resource agents and
fence agents to python as well.

Currently, Pacemaker's python code is compatible with python 2.6 and
2.7. We definitely need to start moving toward python 3 compatibility.
It is possible to support both 2.7 and 3 with the same code, but we need
to lose compatibility with 2.6. (Maintaining a separate branch of code
for 2.6 would not be worth the effort.)

So, I propose that Pacemaker stop supporting python 2.6 as of the next
version (1.1.16, expected sometime in the fall or winter). Not all our
python code is likely to be python3-compatible by that time, but we can
start moving in that direction.

If anyone has any reasons to do this differently, let me know. As this
only affects the test suites currently, and most Linux distributions
have stopped supporting python 2.6 already, I expect more "what took you
so long" responses than "slow down" ;-)
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Minimal metadata for fencing agent

2016-08-05 Thread Ken Gaillot
On 08/05/2016 06:16 AM, Maciej Kopczyński wrote:
> Thanks for your answer Thomas and sorry for messing up the layout of
> messages - I was trying to write from a mobile phone using gmail... I
> was able to put something up using what I found on the web and my own
> writing. My agent seems to do what it has to, except for sending
> proper metadata. I found the following information: "Output of
> fence_agent -o metadata should be validated by relax-ng schema
> (available at fence/agents/lib/metadata.rng)." Checked this location,
> but I am totally noob regarding to XML. There is a pretty extensive
> structure there, and what I need is to prepare a minimal agent to be
> used locally by me just to check if the whole thing makes sense at
> all.
> 
> Do you have any idea as to what is the minimal set of XML data that
> fence agent has to send to stdout? Or any way to work around this?
> Just for testing purposes.

The easiest thing to do is to look at an existing fence agent and mimic
what it does. The key parts are what parameters the agent accepts, and
what actions it supports.

> 
> Best regards,
> Maciek
> 
>> Hi,
>>
>> That is because pcs doesn't work well with external stonith agents, see
>> this github issue https://github.com/ClusterLabs/pcs/issues/81
>>
>> Regards,
>> Tomas
> 
>>> Thanks!
>>>
>>> I ran into more problems though. When configuring a stonith resource using
>>> pcs with stonith:external/libvirt I am geeting "Unable to create resource
>>> (...), it is not installed on this system." I have installed cluster_glue
>>> RPM package (I am running Cent OS), the file is present in the system,
>>> should I enable it somehow for pacemaker?
>>>
>>> Thanks,
>>> Maciek
>>>
>>>
 Hello,

 Sorry if it is a trivial question, but I am facing a wall here. I am
 trying
 to configure fencing on cluster running Hyper-V. I need to modify source
 code for external/libvirt plugin, but I have no idea which package
 provides
 it, cannot Google any files, do you have any idea?

 Thanks in advance,
 Maciek
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Singleton resource not being migrated

2016-08-05 Thread Ken Gaillot
On 08/05/2016 03:48 AM, Andreas Kurz wrote:
> Hi,
> 
> On Fri, Aug 5, 2016 at 2:08 AM, Nikita Koshikov  > wrote:
> 
> Hello list,
> 
> Can you, please, help me in debugging 1 resource not being started
> after node failover ?
> 
> Here is configuration that I'm testing:
> 3 nodes(kvm VM) cluster, that have:
> 
> node 10: aic-controller-58055.test.domain.local
> node 6: aic-controller-50186.test.domain.local
> node 9: aic-controller-12993.test.domain.local
> primitive cmha cmha \
> params conffile="/etc/cmha/cmha.conf"
> daemon="/usr/bin/cmhad" pidfile="/var/run/cmha/cmha.pid" user=cmha \
> meta failure-timeout=30 resource-stickiness=1
> target-role=Started migration-threshold=3 \
> op monitor interval=10 on-fail=restart timeout=20 \
> op start interval=0 on-fail=restart timeout=60 \
> op stop interval=0 on-fail=block timeout=90
> 
> 
> What is the output of crm_mon -1frA once a node is down ... any failed
> actions?
>  
> 
> primitive sysinfo_aic-controller-12993.test.domain.local
> ocf:pacemaker:SysInfo \
> params disk_unit=M disks="/ /var/log" min_disk_free=512M \
> op monitor interval=15s
> primitive sysinfo_aic-controller-50186.test.domain.local
> ocf:pacemaker:SysInfo \
> params disk_unit=M disks="/ /var/log" min_disk_free=512M \
> op monitor interval=15s
> primitive sysinfo_aic-controller-58055.test.domain.local
> ocf:pacemaker:SysInfo \
> params disk_unit=M disks="/ /var/log" min_disk_free=512M \
> op monitor interval=15s
> 
> 
> You can use a clone for this sysinfo resource and a symmetric cluster
> for a more compact configuration  then you can skip all these
> location constraints.
> 
> 
> location cmha-on-aic-controller-12993.test.domain.local cmha 100:
> aic-controller-12993.test.domain.local
> location cmha-on-aic-controller-50186.test.domain.local cmha 100:
> aic-controller-50186.test.domain.local
> location cmha-on-aic-controller-58055.test.domain.local cmha 100:
> aic-controller-58055.test.domain.local
> location sysinfo-on-aic-controller-12993.test.domain.local
> sysinfo_aic-controller-12993.test.domain.local inf:
> aic-controller-12993.test.domain.local
> location sysinfo-on-aic-controller-50186.test.domain.local
> sysinfo_aic-controller-50186.test.domain.local inf:
> aic-controller-50186.test.domain.local
> location sysinfo-on-aic-controller-58055.test.domain.local
> sysinfo_aic-controller-58055.test.domain.local inf:
> aic-controller-58055.test.domain.local
> property cib-bootstrap-options: \
> have-watchdog=false \
> dc-version=1.1.14-70404b0 \
> cluster-infrastructure=corosync \
> cluster-recheck-interval=15s \
> 
> 
> Never tried such a low cluster-recheck-interval ... wouldn't do that. I
> saw setups with low intervals burning a lot of cpu cycles in bigger
> cluster setups and side-effects from aborted transitions. If you do this
> for "cleanup" the cluster state because you see resource-agent errors
> you should better fix the resource agent.

Strongly agree -- your recheck interval is lower than the various action
timeouts. The only reason recheck interval should ever be set less than
about 5 minutes is if you have time-based rules that you want to trigger
with a finer granularity.

Your issue does not appear to be coming from recheck interval, otherwise
it would go away after the recheck interval passed.

> Regards,
> Andreas
>  
> 
> no-quorum-policy=stop \
> stonith-enabled=false \
> start-failure-is-fatal=false \
> symmetric-cluster=false \
> node-health-strategy=migrate-on-red \
> last-lrm-refresh=1470334410
> 
> When 3 nodes online, everything seemed OK, this is output of
> scoreshow.sh:
> ResourceScore
> Node   Stickiness #Fail  
>  Migration-Threshold
> cmha-INFINITY
> aic-controller-12993.test.domain.local 1  0
> cmha
>  101 aic-controller-50186.test.domain.local 1  0
> cmha-INFINITY

Everything is not OK; cmha has -INFINITY scores on two nodes, meaning it
won't be allowed to run on them. This is why it won't start after the
one allowed node goes down, and why cleanup gets it working again
(cleanup removes bans caused by resource failures).

It's likely the resource previously failed the maximum allowed times
(migration-threshold=3) on those two nodes.

The next step would be to figure out why the resource is failing. The
pacemaker logs will 

Re: [ClusterLabs] corosync init script is not a LSB compliance ?

2016-06-30 Thread Ken Gaillot
On 06/30/2016 09:18 AM, Jan Friesse wrote:
> Hi Li,
> 
>> Hi all,
>> I compiled latest version of corosync and found its init script
>> is not LSB compliance. When I check stopped corosync service status
>> with cmd "service corosync status" (on centos 6 without systemd), it
>> returns "1" rather than "3". I found the "status" function in init
>> script return wrong code after checking pid. So, is this a compliance
>> problem?
> 
> At least for corosync, it's simply not implemented. I've created github
> issue (https://github.com/corosync/corosync/issues/138) and fix it later.
> 
> Thanks for report,
>   Honza
> 
>> BTW,there is also a same "mistake" in pacemaker init script.

I'll put it on the to-do list for pacemaker as well



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Doing reload right

2016-06-30 Thread Ken Gaillot
Hello all,

I've been meaning to address the implementation of "reload" in Pacemaker
for a while now, and I think the next release will be a good time, as it
seems to be coming up more frequently.

In the current implementation, Pacemaker considers a resource parameter
"reloadable" if the resource agent supports the "reload" action, and the
agent's metadata marks the parameter with "unique=0". If (only) such
parameters get changed in the resource's pacemaker configuration,
pacemaker will call the agent's reload action rather than the
stop-then-start it usually does for parameter changes.

This is completely broken for two reasons:

1. It relies on "unique=0" to determine reloadability. "unique" was
originally intended (and is widely used by existing resource agents) as
a hint to UIs to indicate which parameters uniquely determine a resource
instance. That is, two resource instances should never have the same
value of a "unique" parameter. For this purpose, it makes perfect sense
that (for example) the path to a binary command would have unique=0 --
multiple resource instances could (and likely would) use the same
binary. However, such a parameter could never be reloadable.

2. Every known resource agent that implements a reload action does so
incorrectly. Pacemaker uses reload for changes in the resource's
*pacemaker* configuration, while all known RAs use reload for a
service's native reload capability of its own configuration file. As an
example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
action, which will have zero effect on any pacemaker-configured
parameters -- and on top of that, the RA uses "unique=0" in its correct
UI sense, and none of those parameters are actually reloadable.

My proposed solution is:

* Add a new "reloadable" attribute for resource agent metadata, to
indicate reloadable parameters. Pacemaker would use this instead of
"unique".

* Add a new "reload-options" RA action for the ability to reload
Pacemaker-configured options. Pacemaker would call this instead if "reload".

* Formalize that "reload" means reload the service's own configuration,
legitimizing the most common existing RA implementations. (Pacemaker
itself will not use this, but tools such as crm_resource might.)

* Review all ocf:pacemaker and ocf:heartbeat agents to make sure they
use unique, reloadable, reload, and reload-options properly.

The downside is that this breaks backward compatibility. Any RA that
actually implements unique and reload so that reload works will lose
reload capability until it is updated to the new style.

While we usually go to great lengths to preserve backward compatibility,
I think it is OK to break it in this case, because most RAs that
implement reload do so wrongly: some implement it as a service reload, a
few advertise reload but don't actually implement it, and others map
reload to start, which might theoretically work in some cases (I'm not
familiar enough with iSCSILogicalUnit and iSCSITarget to be sure), but
typically won't, as the previous service options are not reverted (for
example, I think Route would incorrectly leave the old route in the old
table).

So, I think breaking backward compatibility is actually a good thing
here, since the most reload can do with existing RAs is trigger bad
behavior.

The opposing view would be that we shouldn't punish any RA writer who
implemented this correctly. However, there's no solution that preserves
backward compatibility with both UI usage of unique and reload usage of
unique. Plus, the worst that would happen is that the RA would stop
being reloadable -- not as bad as the current possibilities from
mis-implemented reload.

My questions are:

Does anyone know of an RA that uses reload correctly? Dummy doesn't
count ;-)

Does anyone object to the (backward-incompatible) solution proposed here?
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Sticky resource not sticky after unplugging network cable

2016-07-01 Thread Ken Gaillot
On 07/01/2016 02:13 AM, Auer, Jens wrote:
> Hi,
> 
> I have an active/passive cluster configuration and I am trying to make a
> virtual IP resource sticky such that it does not move back to a node
> after a fail-over. In my setup, I have a location preference for the
> virtual IP to the "primary" node:
> pcs resource show --full
>  Resource: mda-ip (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=192.168.120.20 cidr_netmask=32 nic=bond0
>   Meta Attrs: stickiniess=201
>   Operations: start interval=0s timeout=20s (mda-ip-start-interval-0s)
>   stop interval=0s timeout=20s (mda-ip-stop-interval-0s)
>   monitor interval=30s (mda-ip-monitor-interval-30s)
>  Master: drbd1_sync
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1 notify=true
>   Resource: drbd1 (class=ocf provider=linbit type=drbd)
>Attributes: drbd_resource=shared_fs
>Operations: start interval=0s timeout=240 (drbd1-start-interval-0s)
>promote interval=0s timeout=90 (drbd1-promote-interval-0s)
>demote interval=0s timeout=90 (drbd1-demote-interval-0s)
>stop interval=0s timeout=100 (drbd1-stop-interval-0s)
>monitor interval=60s (drbd1-monitor-interval-60s)
>  Resource: shared_fs (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: device=/dev/drbd1 directory=/shared_fs fstype=xfs
>   Operations: start interval=0s timeout=60 (shared_fs-start-interval-0s)
>   stop interval=0s timeout=60 (shared_fs-stop-interval-0s)
>   monitor interval=20 timeout=40 (shared_fs-monitor-interval-20)
>  Resource: PF-PEP (class=ocf provider=pfpep type=pfpep_clusterSwitch)
>   Operations: start interval=0s timeout=20 (PF-PEP-start-interval-0s)
>   stop interval=0s timeout=20 (PF-PEP-stop-interval-0s)
>   monitor interval=10 timeout=20 (PF-PEP-monitor-interval-10)
>  Clone: supervisor-clone
>   Resource: supervisor (class=ocf provider=pfpep type=pfpep_supervisor)
>Operations: start interval=0s timeout=20 (supervisor-start-interval-0s)
>stop interval=0s timeout=20 (supervisor-stop-interval-0s)
>monitor interval=10 timeout=20
> (supervisor-monitor-interval-10)
>  Clone: snmpAgent-clone
>   Resource: snmpAgent (class=ocf provider=pfpep type=pfpep_snmpAgent)
>Operations: start interval=0s timeout=20 (snmpAgent-start-interval-0s)
>stop interval=0s timeout=20 (snmpAgent-stop-interval-0s)
>monitor interval=10 timeout=20
> (snmpAgent-monitor-interval-10)
> 
> Location Constraints:
>   Resource: mda-ip
> Enabled on: MDA1PFP (score:50) (id:location-mda-ip-MDA1PFP-50)
> Ordering Constraints:
>   promote drbd1_sync then start shared_fs (kind:Mandatory)
> (id:order-drbd1_sync-shared_fs-mandatory)
>   start shared_fs then start PF-PEP (kind:Mandatory)
> (id:order-shared_fs-PF-PEP-mandatory)
>   start snmpAgent-clone then start supervisor-clone (kind:Optional)
> (id:order-snmpAgent-clone-supervisor-clone-Optional)
>   start shared_fs then start snmpAgent-clone (kind:Optional)
> (id:order-shared_fs-snmpAgent-clone-Optional)
> Colocation Constraints:
>   mda-ip with drbd1_sync (score:INFINITY) (with-rsc-role:Master)
> (id:colocation-mda-ip-drbd1_sync-INFINITY)
>   shared_fs with drbd1_sync (score:INFINITY) (with-rsc-role:Master)
> (id:colocation-shared_fs-drbd1_sync-INFINITY)
>   PF-PEP with mda-ip (score:INFINITY) (id:colocation-PF-PEP-mda-ip-INFINITY)
> 
> pcs resource defaults
> resource-stickiness: 100
> 
> I use the virtual IP as a master resource and colocate everyhting else
> with it. The resource prefers one node with a score of 50, and the
> stickiness is 100 so I expect that after switching to the passive node
> and activating the primary node again the resource stays on the passive
> node. This works fine if I manually stop the primary node with pcs
> cluster stop. However, when I try to force a fail-over by unplugging the
> network cables of the primary node, and then after waiting  plug in the
> cables again, the resource moves back to the primary node.
> 
> I tried larger stickiness values, and also to set a meta
> resource-stickiness property on the resource itself, but it did not
> change. How do configure this?

Your "mda-ip with drbd1_sync master" colocation constraint has a score
of INFINITY, so it takes precedence over stickiness. Once drbd1_sync is
promoted on a node, mda-ip will move to it regardless of stickiness.
Perhaps what you want is the location preference to refer to drbd1_sync
master instead of mda-ip.

> Best wishes,
>   Jens
> 
> --
> *Jens Auer *| CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> _jens.auer@cgi.com_ 
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie
> unter _de.cgi.com/pflichtangaben_ .
>  
> CONFIDENTIALITY NOTICE: 

Re: [ClusterLabs] Master-Slaver resource Restarted after configuration change

2016-06-29 Thread Ken Gaillot
On 06/29/2016 01:35 PM, Ilia Sokolinski wrote:
> 
>>
>> I'm not sure there's a way to do this.
>>
>> If a (non-reloadable) parameter changes, the entire clone does need a
>> restart, so the cluster will want all instances to be stopped, before
>> proceeding to start them all again.
>>
>> Your desired behavior couldn't be the default, because not all services
>> would be able to function correctly with a running master using
>> different configuration options than running slaves. In fact, I think it
>> would be rare; consider a typical option for a TCP port -- changing the
>> port in only the slaves would break communication with the master and
>> potentially lead to data inconsistency.
>>
>> Can you give an example of an option that could be handled this way
>> without causing problems?
>>
>> Reload could be a way around this, but not in the way you suggest. If
>> your service really does need to restart after the option change, then
>> reload is not appropriate. However, if you can approach the problem on
>> the application side, and make it able to accept the change without
>> restarting, then you could implement it as a reload in the agent.
>>
> 
> Ken,
> 
> I see what you are saying.
> The parameter we are changing is the docker image version, so it is not 
> possible to Reload it without a restart.
> 
> Couple of questions:
> What is reloadable vs non-reloadable parameter? Is it the same as unique=“0” 
> vs unique=“1”?
> We currently set  unique=“0”.

Yes, the cluster considers any parameter with unique=0 as reloadable, if
the resource agent supports the reload action.

> When doing repeated experiments, I see that sometimes both Master and Slave 
> are Reload-ed, but sometimes one of them is Restart-ed.
> 
> Why is that?

Good question. I would expect all or no instances of the same clone to
be reloaded.

An otherwise reloadable change may get a restart if there is also a
nonreloadable parameter changing at the same time. Also, if the
reloadable resource is ordered after another resource that is being
restarted, it will get a restart.

As an aside, I'm not happy with the current implementation of reload.
Using "unique" to determine reloadability was not a good choice; it
should be a separate attribute. More importantly, there's a fundamental
misunderstanding between pacemaker's use of reload and how most resource
agent writers interpret it -- pacemaker calls it when a resource
parameter in the pacemaker configuration changes, but most RAs use it
for a service's native reload of its own configuration file. Those two
use cases need to be separated.

> I looked at the source code allocate.c:check_action_definition(), and it 
> seems that there is a meta parameter
> called “isolation” which affects on Reload vs Restart decision.
> 
> I can’t find any documentation about this “isolation” meta parameter.
> Do you know what is is intended for?

That is a great feature that, unfortunately, completely lacks
documentation and testing. It's a way to run cluster-managed services
inside a Docker container. Documentation/testing are on the to-do list,
but it's a long list ...

> Thanks a lot
> 
> Ilia

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Default Behavior

2016-06-29 Thread Ken Gaillot
On 06/29/2016 04:54 AM, Klaus Wenninger wrote:
> On 06/29/2016 11:00 AM, Pavlov, Vladimir wrote:
>> Thanks a lot.
>> We also thought to use Fencing (stonith).
>> But production cluster works in the cloud, node1 and node2 is virtual 
>> machines without any hardware fencing devices.
> But there are fence-agents that do fencing via the hypervisor (e.g.
> fence_xvm).
>> We looked in the direction of the SBR, but its use as far as we understand 
>> is not justified without shared storage in two-node cluster:
>> http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit
> Using SBD with a watchdog (provided your virtual environment provides a
> watchdog device inside VMs) for
> self-fencing is probably better than no fencing at all.

You can also ask your cloud provider if they provide an API for
hard-rebooting instances. If so, there are some fence agents in the wild
for common cloud provider APIs, or you could write your own.

> Regards,
> Klaus
>> Are there any ways to do fencing?
>> Specifically for our situation, we have found another workaround - use DR 
>> instead of NAT in IPVS.
>> In the case of DR, even if both servers are active at the same time it does 
>> not matter which of them serve the connection from the client. Web servers 
>> responds to the client directly.
>> This workaround has a right to life?

I forget what happens if both ldirectord are up and can't communicate,
but it's not that simple.

>> Kind regards,
>>  
>> Vladimir Pavlov
>>
>> Message: 2
>> Date: Tue, 28 Jun 2016 18:53:38 +0300
>> From: "Pavlov, Vladimir" <vladimir.pav...@tns-global.ru>
>> To: "'Users@clusterlabs.org'" <Users@clusterlabs.org>
>> Subject: [ClusterLabs] Default Behavior
>> Message-ID:
>>  <b38b34ec5621e34dabce13e8b18936e6033f0b17c...@exserv.gallup.tns>
>> Content-Type: text/plain; charset="koi8-r"
>>
>> Hello!
>> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7), with 
>> resources IPaddr2 and ldirectord.
>> Cluster Properties:
>> cluster-infrastructure: cman
>> dc-version: 1.1.11-97629de
>> no-quorum-policy: ignore
>> stonith-enabled: false
>> The cluster has been configured for this documentation: 
>> http://clusterlabs.org/quickstart-redhat-6.html
>> Recently, there was a communication failure between cluster nodes and the 
>> behavior was like this:
>>
>> -During a network failure, each server has become the Master.
>>
>> -After the restoration of the network, one node killing services of 
>> Pacemaker on the second node.
>>
>> -The second node was not available for the cluster, but all 
>> resources remain active (Ldirectord,ipvs,ip address). That is, both nodes 
>> continue to be active.
>> We decided to create a test stand and play the situation, but with current 
>> version of Pacemaker in CentOS repos, ?luster behaves differently:
>>
>> -During a network failure, each server has become the Master.
>>
>> -After the restoration of the network, all resources are stopped.
>>
>> -Then the resources are run only on one node. - This behavior seems 
>> to be more logical.
>> Current Cluster Properties on test stand:
>> cluster-infrastructure: cman
>> dc-version: 1.1.14-8.el6-70404b0
>> have-watchdog: false
>> no-quorum-policy: ignore
>> stonith-enabled: false
>> Changed the behavior of the cluster in the new version or accident is not 
>> fully emulated?
>> Thank you.
>>
>>
>> Kind regards,
>>
>> Vladimir Pavlov
>>
>> -- next part --
>> An HTML attachment was scrubbed...
>> URL: 
>> <http://clusterlabs.org/pipermail/users/attachments/20160628/b340b971/attachment-0001.html>
>>
>> --
>>
>> Message: 3
>> Date: Tue, 28 Jun 2016 12:07:36 -0500
>> From: Ken Gaillot <kgail...@redhat.com>
>> To: users@clusterlabs.org
>> Subject: Re: [ClusterLabs] Default Behavior
>> Message-ID: <5772aed8.6060...@redhat.com>
>> Content-Type: text/plain; charset=UTF-8
>>
>> On 06/28/2016 10:53 AM, Pavlov, Vladimir wrote:
>>> Hello!
>>>
>>> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7),
>>> with resources IPaddr2 and ldirectord.
>>>
>>> Cluster Properties:
>>>
>>> cluster-infrastructure: cman
>>>
>>> dc-version: 1.1.11-97629de
>>>
>>> no-quorum-policy: ignore
>>>
>>>

Re: [ClusterLabs] error: crm_timer_popped: Shutdown Escalation (I_STOP) just popped in state S_POLICY_ENGINE

2016-06-29 Thread Ken Gaillot
On 06/29/2016 09:38 AM, Kostiantyn Ponomarenko wrote:
> Hello,
> 
> I am seeing those error messages in the syslog when the machine goes
> down (one-node cluster):

For anyone who missed the IRC discussion:

This is probably the issue fixed by commit 6aae854 in the just-released
Pacemaker 1.1.15. Without the fix, this can happen when shutting down
the host while pacemaker is still running, so a workaround is to stop
pacemaker before shutting down.

> Jun 29 14:10:29 A4-4U24-303-LS systemd[1]: Stopped Pacemaker High
> Availability Cluster Manager.
> Jun 29 14:10:29 A4-4U24-303-LS crmd[4856]: error:
> lrm_state_verify_stopped: 3 resources were active at shutdown.
> Jun 29 14:10:29 A4-4U24-303-LS crmd[4856]: error: crm_timer_popped:
> Shutdown Escalation (I_STOP) just popped in state S_POLICY_ENGINE!
> (120ms)
> Jun 29 14:10:23 A4-4U24-303-LS diskHelper(sm1dh)[18992]: WARNING: RA:
> [monitor] : got rc=3
> Jun 29 14:10:18 A4-4U24-303-LS diskHelper(sm0dh)[18982]: WARNING: RA:
> [monitor] : got rc=3
> Jun 29 14:10:15 A4-4U24-303-LS diskManager(diskManager)[18972]: WARNING:
> RA: [monitor] : got rc=3
> Jun 29 14:10:10 A4-4U24-303-LS crmd[4856]: error: print_synapse: [Action
>   16]: In-flight rsc op sm1_stop_0on node-0
> (priority: 0, waiting: none)
> Jun 29 14:10:10 A4-4U24-303-LS crmd[4856]: error: print_synapse: [Action
>   14]: In-flight rsc op sm0_stop_0on node-0
> (priority: 0, waiting: none)
> Jun 29 14:10:10 A4-4U24-303-LS crmd[4856]: error: print_synapse: [Action
>   13]: In-flight rsc op pmdh_stop_0   on node-0
> (priority: 0, waiting: none)
> Jun 29 14:10:10 A4-4U24-303-LS crmd[4856]: error: print_synapse: [Action
>   12]: In-flight rsc op dmdh_stop_0   on node-0
> (priority: 0, waiting: none)
> ..
> Jun 29 14:07:10 A4-4U24-303-LS crmd[4856]: error: cib_action_updated:
> Update 209 FAILED: Timer expired
> Jun 29 14:07:10 A4-4U24-303-LS crmd[4856]: error: cib_action_updated:
> Update 208 FAILED: Timer expired
> Jun 29 14:07:10 A4-4U24-303-LS crmd[4856]: error: cib_action_updated:
> Update 207 FAILED: Timer expired
> Jun 29 14:07:10 A4-4U24-303-LS crmd[4856]: error: cib_action_updated:
> Update 206 FAILED: Timer expired
> Jun 29 14:06:50 A4-4U24-303-LS crmd[4856]: error: print_synapse: [Action
>   16]: In-flight rsc op sm1_stop_0on node-0
> (priority: 0, waiting: none)
> Jun 29 14:06:50 A4-4U24-303-LS crmd[4856]: error: print_synapse: [Action
>   14]: In-flight rsc op sm0_stop_0on node-0
> (priority: 0, waiting: none)
> Jun 29 14:06:50 A4-4U24-303-LS crmd[4856]: error: print_synapse: [Action
>   13]: In-flight rsc op pmdh_stop_0   on node-0
> (priority: 0, waiting: none)
> Jun 29 14:06:50 A4-4U24-303-LS crmd[4856]: error: print_synapse: [Action
>   12]: In-flight rsc op dmdh_stop_0   on node-0
> (priority: 0, waiting: none)
> 
> Thank you,
> Kostia

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Doing reload right

2016-07-01 Thread Ken Gaillot
On 07/01/2016 04:48 AM, Jan Pokorný wrote:
> On 01/07/16 09:23 +0200, Ulrich Windl wrote:
>>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 30.06.2016 um 18:58 in 
>>>>> Nachricht
>> <57754f9f.8070...@redhat.com>:
>>> I've been meaning to address the implementation of "reload" in Pacemaker
>>> for a while now, and I think the next release will be a good time, as it
>>> seems to be coming up more frequently.
>>>
>>> In the current implementation, Pacemaker considers a resource parameter
>>> "reloadable" if the resource agent supports the "reload" action, and the
>>> agent's metadata marks the parameter with "unique=0". If (only) such
>>> parameters get changed in the resource's pacemaker configuration,
>>> pacemaker will call the agent's reload action rather than the
>>> stop-then-start it usually does for parameter changes.
>>>
>>> This is completely broken for two reasons:
>>
>> I agree ;-)
>>
>>>
>>> 1. It relies on "unique=0" to determine reloadability. "unique" was
>>> originally intended (and is widely used by existing resource agents) as
>>> a hint to UIs to indicate which parameters uniquely determine a resource
>>> instance. That is, two resource instances should never have the same
>>> value of a "unique" parameter. For this purpose, it makes perfect sense
>>> that (for example) the path to a binary command would have unique=0 --
>>> multiple resource instances could (and likely would) use the same
>>> binary. However, such a parameter could never be reloadable.
>>
>> I tought unique=0 were reloadable (unique=1 were not)...

Correct. By "could never be reloadable", I mean that if someone changes
the location of the daemon binary, there's no way the agent could change
that with anything other than a full restart. So using unique=0 to
indicate reloadable doesn't make sense.

> I see a doubly-distorted picture here:
> - actually "unique=1" on a RA parameter (together with this RA supporting
>   "reload") currently leads to reload-on-change
> - also the provided example shows why reload for "unique=0" is wrong,
>   but as the opposite applies as of current state, it's not an argument
>   why something is broken
> 
> See also:
> https://github.com/ClusterLabs/pacemaker/commit/2f5d44d4406e9a8fb5b380cb56ab8a70d7ad9c23

Nope, unique=1 is used for the *restart* list -- the non-reloadable
parameters.

>>> 2. Every known resource agent that implements a reload action does so
>>> incorrectly. Pacemaker uses reload for changes in the resource's
>>> *pacemaker* configuration, while all known RAs use reload for a
>>> service's native reload capability of its own configuration file. As an
>>> example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
>>> action, which will have zero effect on any pacemaker-configured
>>> parameters -- and on top of that, the RA uses "unique=0" in its correct
>>> UI sense, and none of those parameters are actually reloadable.
> 
> (per the last subclause, applicable also, after mentioned inversion, for
> "unique=1", such as a pid file path, which cannot be reloadable for
> apparent reason)
> 
>> Maybe LSB confusion...
> 
> That's not entirely fair vindication, as when you have to do some
> extra actions with parameters in LSB-aliased "start" action in the
> RA, you should do such reflections also for "reload".

I think the point is that "reload" for an LSB init script or systemd
unit always reloads the native service configuration, so it's natural
for administrators and developers to think of that when they see "reload".

>>> My proposed solution is:
>>>
>>> * Add a new "reloadable" attribute for resource agent metadata, to
>>> indicate reloadable parameters. Pacemaker would use this instead of
>>> "unique".
>>
>> No objections if you change the XML metadata version number this time ;-)
> 
> Good point, but I guess everyone's a bit scared to open this Pandora
> box as there's so much technical debt connected to that (unifying FA/RA
> metadata if possible, adding new UI-oriented annotations, pacemaker's
> silent additions like "private" parameter).
> I'd imagine an established authority for OCF matters (and maintaing
> https://github.com/ClusterLabs/OCF-spec) and at least partly formalized
> process inspired by Python PEPs for coordinated development:
> https://www.python.org/

Re: [ClusterLabs] Singleton resource not being migrated

2016-08-16 Thread Ken Gaillot
On 08/05/2016 05:12 PM, Nikita Koshikov wrote:
> Thanks, Ken,
> 
> On Fri, Aug 5, 2016 at 7:21 AM, Ken Gaillot <kgail...@redhat.com
> <mailto:kgail...@redhat.com>> wrote:
> 
> On 08/05/2016 03:48 AM, Andreas Kurz wrote:
> > Hi,
> >
> > On Fri, Aug 5, 2016 at 2:08 AM, Nikita Koshikov <koshi...@gmail.com 
> <mailto:koshi...@gmail.com>
> > <mailto:koshi...@gmail.com <mailto:koshi...@gmail.com>>> wrote:
> >
> > Hello list,
> >
> > Can you, please, help me in debugging 1 resource not being started
> > after node failover ?
> >
> > Here is configuration that I'm testing:
> > 3 nodes(kvm VM) cluster, that have:
> >
> > node 10: aic-controller-58055.test.domain.local
> > node 6: aic-controller-50186.test.domain.local
> > node 9: aic-controller-12993.test.domain.local
> > primitive cmha cmha \
> > params conffile="/etc/cmha/cmha.conf"
> > daemon="/usr/bin/cmhad" pidfile="/var/run/cmha/cmha.pid"
> user=cmha \
> > meta failure-timeout=30 resource-stickiness=1
> > target-role=Started migration-threshold=3 \
> > op monitor interval=10 on-fail=restart timeout=20 \
> > op start interval=0 on-fail=restart timeout=60 \
> > op stop interval=0 on-fail=block timeout=90
> >
> >
> > What is the output of crm_mon -1frA once a node is down ... any failed
> > actions?
> >
> >
> > primitive sysinfo_aic-controller-12993.test.domain.local
> > ocf:pacemaker:SysInfo \
> > params disk_unit=M disks="/ /var/log" min_disk_free=512M \
> > op monitor interval=15s
> > primitive sysinfo_aic-controller-50186.test.domain.local
> > ocf:pacemaker:SysInfo \
> > params disk_unit=M disks="/ /var/log" min_disk_free=512M \
> > op monitor interval=15s
> > primitive sysinfo_aic-controller-58055.test.domain.local
> > ocf:pacemaker:SysInfo \
> > params disk_unit=M disks="/ /var/log" min_disk_free=512M \
> > op monitor interval=15s
> >
> >
> > You can use a clone for this sysinfo resource and a symmetric cluster
> > for a more compact configuration  then you can skip all these
> > location constraints.
> >
> >
> > location cmha-on-aic-controller-12993.test.domain.local cmha 100:
> > aic-controller-12993.test.domain.local
> > location cmha-on-aic-controller-50186.test.domain.local cmha 100:
> > aic-controller-50186.test.domain.local
> > location cmha-on-aic-controller-58055.test.domain.local cmha 100:
> > aic-controller-58055.test.domain.local
> > location sysinfo-on-aic-controller-12993.test.domain.local
> > sysinfo_aic-controller-12993.test.domain.local inf:
> > aic-controller-12993.test.domain.local
> > location sysinfo-on-aic-controller-50186.test.domain.local
> > sysinfo_aic-controller-50186.test.domain.local inf:
> > aic-controller-50186.test.domain.local
> > location sysinfo-on-aic-controller-58055.test.domain.local
> > sysinfo_aic-controller-58055.test.domain.local inf:
> > aic-controller-58055.test.domain.local
> > property cib-bootstrap-options: \
> > have-watchdog=false \
> > dc-version=1.1.14-70404b0 \
> > cluster-infrastructure=corosync \
> > cluster-recheck-interval=15s \
> >
> >
> > Never tried such a low cluster-recheck-interval ... wouldn't do
> that. I
> > saw setups with low intervals burning a lot of cpu cycles in bigger
> > cluster setups and side-effects from aborted transitions. If you
> do this
> > for "cleanup" the cluster state because you see resource-agent errors
> > you should better fix the resource agent.
> 
> Strongly agree -- your recheck interval is lower than the various action
> timeouts. The only reason recheck interval should ever be set less than
> about 5 minutes is if you have time-based rules that you want to trigger
> with a finer granularity.
> 
> Your issue does not appear to be coming from recheck interval, otherwise
> it would go away after the recheck in

Re: [ClusterLabs] Failover When Host is Up, Out of Order Logs

2017-02-01 Thread Ken Gaillot
On 01/31/2017 11:44 AM, Corey Moullas wrote:
> I have been getting extremely strange behavior from a Corosync/Pacemaker
> install on OVH Public Cloud servers.
> 
>  
> 
> After hours of Googling, I thought I would try posting here to see if
> somebody knows what to do.
> 
>  
> 
> I see this in my logs very frequently:
> 
> Jan 31 10:31:36 [581] fusion01-2 corosync warning [MAIN  ] Corosync main
> process was not scheduled for 24334.5645 ms (threshold is 2400. ms).
> Consider token timeout increase.
> 
> Jan 31 10:31:36 [581] fusion01-2 corosync notice  [TOTEM ] A processor
> failed, forming new configuration.
> 
>  
> 
> I have increased token time to 10s and this still occurs regularly even
> though both hosts are always up.

The "was not scheduled" message means the kernel is not giving corosync
enough CPU time to keep the token alive. The message indicates that it
didn't get scheduled for 24 seconds, which is why 10 seconds wouldn't
help -- but simply raising the timeout isn't a good idea at that point.
You need to figure out why you're starved for CPU. Public clouds don't
generally provide any guarantees, so it may be on their end.

> There are also times when the floating IP script is fired, but corosync
> does not seem aware. When you run crm status it will show the ip being
> bound the fusion01-1 when in fact the script fired and moved it to
> fusion01-2.
> 
>  
> 
> Finally, the timing of the logs seems very odd. This is what the end of
> my corosync log file looks like. Notice the times appear out of order in
> the logs. I’m ripping my hair out with these issues. Anybody have a clue
> what may be going on here?

Both pacemaker and corosync write to the same log (which is probably a
bad idea and will be changed one day, but that's not of concern here).
Each is using its own time source. If pacemaker is getting scheduled
more frequently than corosync, it's certainly possible log messages will
be out of order, since corosync isn't able to write out buffers as often.

> 
>  
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_process_request:   Completed cib_modify operation for
> section status: OK (rc=0, origin=fusion01-1/crmd/245, version=0.81.123)
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:Diff: --- 0.81.123 2
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:Diff: +++ 0.81.124 (null)
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:-- /cib/status/node_state[@id='1']/lrm[@id='1']
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:+  /cib:  @num_updates=124
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_process_request:   Completed cib_delete operation for
> section //node_state[@uname='fusion01-1']/lrm: OK (rc=0,
> origin=fusion01-1/crmd/246, version=0.81.124)
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:Diff: --- 0.81.124 2
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:Diff: +++ 0.81.125 (null)
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:+  /cib:  @num_updates=125
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:+  /cib/status/node_state[@id='1']: 
> @crm-debug-origin=do_lrm_query_internal
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:++ /cib/status/node_state[@id='1']:  
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:++
> 
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:++  
> 
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:++
>  operation_key="FloatIP_start_0" operation="start"
> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10"
> transition-key="5:31:0:08c3f481-ccde-4f75-b1a7-acf8168cd0c1"
> transition-magic="0:0;5:31:0:08c3f481-ccde-4f75-b1a7-acf8168cd0c1"
> on_node="fusion01-1" call-id="17" rc-code="0" op-status="0" interval="0"
> last-run="1485859189" last-rc-change="1485859189" e
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:++
>   
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:++  
> 
> 
> Jan 31 10:31:47 [21062] fusion01-2cib: info:
> cib_perform_op:++
>  operation="monitor" crm-debug-origin="build_active_RAs"
> crm_feature_set="3.0.10"
> transition-key="4:1:7:1fe20aa3-b305-4282-99a3-b1f8190d3c2c"
> transition-magic="0:0;4:1:7:1fe20aa3-b305-4282-99a3-b1f8190d3c2c"
> on_node="fusion01-1" call-id="9" rc-code="0" 

Re: [ClusterLabs] [Question] About a change of crm_failcount.

2017-02-02 Thread Ken Gaillot
On 02/02/2017 12:23 PM, renayama19661...@ybb.ne.jp wrote:
> Hi All,
> 
> By the next correction, the user was not able to set a value except zero in 
> crm_failcount.
> 
>  - [Fix: tools: implement crm_failcount command-line options correctly]
>- 
> https://github.com/ClusterLabs/pacemaker/commit/95db10602e8f646eefed335414e40a994498cafd#diff-6e58482648938fd488a920b9902daac4
> 
> However, pgsql RA sets INFINITY in a script.
> 
> ```
> (snip)
> CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount"
> (snip)
> ocf_exit_reason "My data is newer than new master's one. New   master's 
> location : $master_baseline"
> exec_with_retry 0 $CRM_FAILCOUNT -r $OCF_RESOURCE_INSTANCE -U $NODENAME 
> -v INFINITY
> return $OCF_ERR_GENERIC
> (snip)
> ```
> 
> There seems to be the influence only in pgsql somehow or other.
> 
> Can you revise it to set a value except zero in crm_failcount?
> We make modifications to use crm_attribute in pgsql RA if we cannot revise it.
> 
> Best Regards,
> Hideo Yamauchi.

Hmm, I didn't realize that was used. I changed it because it's not a
good idea to set fail-count without also changing last-failure and
having a failed op in the LRM history. I'll have to think about what the
best alternative is.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] A stop job is running for pacemaker high availability cluster manager

2017-02-02 Thread Ken Gaillot
On 02/02/2017 03:06 PM, Oscar Segarra wrote:
> Hi Ken, 
> 
> I have checked the /var/log/cluster/corosync.log and there no
> information about why system hangs stopping... 
> 
> ¿Can you be more specific about what logs to check?
> 
> Thanks a lot.

There, and /var/log/messages sometimes has relevant messages from
non-cluster components.

You'd want to look for messages like "Caught 'Terminated' signal" and
"Shutting down", as well as resources being stopped ("_stop_0"), then
various "Disconnect" and "Stopping" messages as individual daemons exit.

> 2017-02-02 21:10 GMT+01:00 Ken Gaillot <kgail...@redhat.com
> <mailto:kgail...@redhat.com>>:
> 
> On 02/02/2017 12:35 PM, Oscar Segarra wrote:
> > Hi,
> >
> > I have a two node cluster... when I try to shutdown the physical host I
> > get the following message in console: "a stop job is running for
> > pacemaker high availability cluster manager" and never stops...
> 
> That would be a message from systemd. You'll need to check the pacemaker
> status and/or logs to see why pacemaker can't shut down.
> 
> Without stonith enabled, pacemaker will be unable to recover if a
> resource fails to stop. That could lead to a hang.
> 
> > This is my configuration:
> >
> > [root@vdicnode01 ~]# pcs config
> > Cluster Name: vdic-cluster
> > Corosync Nodes:
> >  vdicnode01-priv vdicnode02-priv
> > Pacemaker Nodes:
> >  vdicnode01-priv vdicnode02-priv
> >
> > Resources:
> >  Resource: nfs-vdic-mgmt-vm-vip (class=ocf provider=heartbeat
> type=IPaddr)
> >   Attributes: ip=192.168.100.200 cidr_netmask=24
> >   Operations: start interval=0s timeout=20s
> > (nfs-vdic-mgmt-vm-vip-start-interval-0s)
> >   stop interval=0s timeout=20s
> > (nfs-vdic-mgmt-vm-vip-stop-interval-0s)
> >   monitor interval=10s
> > (nfs-vdic-mgmt-vm-vip-monitor-interval-10s)
> >  Clone: nfs_setup-clone
> >   Resource: nfs_setup (class=ocf provider=heartbeat type=ganesha_nfsd)
> >Attributes: ha_vol_mnt=/var/run/gluster/shared_storage
> >Operations: start interval=0s timeout=5s
> (nfs_setup-start-interval-0s)
> >stop interval=0s timeout=5s
> (nfs_setup-stop-interval-0s)
> >monitor interval=0 timeout=5s
> (nfs_setup-monitor-interval-0)
> >  Clone: nfs-mon-clone
> >   Resource: nfs-mon (class=ocf provider=heartbeat type=ganesha_mon)
> >Operations: start interval=0s timeout=40s
> (nfs-mon-start-interval-0s)
> >stop interval=0s timeout=40s (nfs-mon-stop-interval-0s)
> >monitor interval=10s timeout=10s
> > (nfs-mon-monitor-interval-10s)
> >  Clone: nfs-grace-clone
> >   Meta Attrs: notify=true
> >   Resource: nfs-grace (class=ocf provider=heartbeat
> type=ganesha_grace)
> >Meta Attrs: notify=true
> >Operations: start interval=0s timeout=40s
> (nfs-grace-start-interval-0s)
> >stop interval=0s timeout=40s
> (nfs-grace-stop-interval-0s)
> >monitor interval=5s timeout=10s
> > (nfs-grace-monitor-interval-5s)
> >  Resource: vm-vdicone01 (class=ocf provider=heartbeat
> type=VirtualDomain)
> >   Attributes: hypervisor=qemu:///system
> > config=/mnt/nfs-vdic-mgmt-vm/vdicone01.xml
> > migration_network_suffix=tcp:// migration_transport=ssh
> >   Meta Attrs: allow-migrate=true target-role=Stopped
> >   Utilization: cpu=1 hv_memory=512
> >   Operations: start interval=0s timeout=90
> (vm-vdicone01-start-interval-0s)
> >   stop interval=0s timeout=90
> (vm-vdicone01-stop-interval-0s)
> >   monitor interval=20s role=Stopped
> > (vm-vdicone01-monitor-interval-20s)
> >   monitor interval=30s (vm-vdicone01-monitor-interval-30s)
> >  Resource: vm-vdicsunstone01 (class=ocf provider=heartbeat
> > type=VirtualDomain)
> >   Attributes: hypervisor=qemu:///system
> > config=/mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml
> > migration_network_suffix=tcp:// migration_transport=ssh
> >   Meta Attrs: allow-migrate=true target-role=Stopped
> >   Utilization: cpu=1 hv_memory=1024
> >   Operations: start interval=0s timeout=90
> > (vm-vdicsunstone01-start-interval-0s)
> >   stop interval=0s timeout=90
>

Re: [ClusterLabs] Live Guest Migration timeouts for VirtualDomain resources

2017-02-01 Thread Ken Gaillot
s95kjg110065_res-migrate_to-interval-0s)  <<< New op name / value
>>
>>
>> Where does that original op name come from in the VirtualDomain resource
>> definition?  How can we get the initial meta value changed and shipped
> with
>> a valid operation name (i.e. migrate_to), and
>> maybe a more reasonable migrate_to timeout value... something
> significantly
>> higher than 1200ms , i.e. 1.2 seconds ?  Can I report this request as a
>> bugzilla on the RHEL side, or should this go to my internal IBM bugzilla
>> for KVM on System Z development?
>>
>> Anyway, thanks so much for identifying my issue.  I can reconfigure my
>> resources to make them tolerate longer migration execution times.
>>
>>
>> Scott Greenlese ... IBM KVM on System Z Solution Test
>>   INTERNET:  swgre...@us.ibm.com
>>
>>
>>
>>
>> From: Ken Gaillot <kgail...@redhat.com>
>> To: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>,
>> users@clusterlabs.org
>> Date: 01/19/2017 10:26 AM
>> Subject: Re: [ClusterLabs] Antw: Re: Live Guest Migration timeouts for
>> VirtualDomain resources
>>
>>
>>
>> On 01/19/2017 01:36 AM, Ulrich Windl wrote:
>>>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 18.01.2017 um 16:32 in
>> Nachricht
>>> <4b02d3fa-4693-473b-8bed-dc98f9e3f...@redhat.com>:
>>>> On 01/17/2017 04:45 PM, Scott Greenlese wrote:
>>>>> Ken and Co,
>>>>>
>>>>> Thanks for the useful information.
>>>>>
>>>
>>> [...]
>>>>>
>>>>> Is this internally coded within the class=ocf provider=heartbeat
>>>>> type=VirtualDomain resource agent?
>>>>
>>>> Aha, I just realized what the issue is: the operation name is
>>>> migrate_to, not migrate-to.
>>>>
>>>> For technical reasons, pacemaker can't validate operation names (at the
>>>> time that the configuration is edited, it does not necessarily have
>>>> access to the agent metadata).
>>>
>>> BUT the set of operations is finite, right? So if those were in some XML
>> schema, the names could be verified at least (not meaning that the
>> operation is actually supported).
>>> BTW: Would a "crm configure verify" detect this kijnd of problem?
>>>
>>> [...]
>>>
>>> Ulrich
>>
>> Yes, it's in the resource agent meta-data. While pacemaker itself uses a
>> small set of well-defined actions, the agent may define any arbitrarily
>> named actions it desires, and the user could configure one of these as a
>> recurring action in pacemaker.
>>
>> Pacemaker itself has to be liberal about where its configuration comes
>> from -- the configuration can be edited on a separate machine, which
>> doesn't have resource agents, and then uploaded to the cluster. So
>> Pacemaker can't do that validation at configuration time. (It could
>> theoretically do some checking after the fact when the configuration is
>> loaded, but this could be a lot of overhead, and there are
>> implementation issues at the moment.)
>>
>> Higher-level tools like crmsh and pcs, on the other hand, can make
>> simplifying assumptions. They can require access to the resource agents
>> so that they can do extra validation.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Resource Priority

2017-02-01 Thread Ken Gaillot
On 02/01/2017 09:07 AM, Ulrich Windl wrote:
 Chad Cravens  schrieb am 01.02.2017 um 15:52 in
> Nachricht
> :
>> Hello Cluster Fans!
>>
>> I've had a great time working with the clustering software. Implementing a
>> HUGE cluster solution now (100+ resources) and it's working great!
>>
>> I had a question regarding prioritizing resources. Say I have three nodes
>> (A,B,C) and 2 database resources (DB1, DB2). Database resources normally
>> run on A and B, and both failover to C.
>>
>> What I would like to do is prioritize resource DB1 over DB2 if both have to
>> failover to node C. For example, if DB2 has failed over and is running on
>> node C, and at a later time DB1 fails over to node C, that DB2 would stop
>> (no longer running at all on any node) and DB1 would run. Essentially DB1
>> is kicking DB2 off the cluster. I was wondering if there would be a clean
>> way to implement something like this using standard cluster configurations
>> parameters or if I'd have to create a custom RA script that would run
>> cluster commands to do this?
> 
> Hi!
> 
> What about this?: First use utilization constraints to avoid overloading your 
> nodes (change nodes and each resource). The use priorities for the resources. 
> Now when more resources want to run on a node (C) that the node can deal 
> with, the higher priority resources will succeed. Drawback: Stopping a 
> resource can cause some resource shuffling between nodes. We run a similar 
> configuration with SLES11 for a few years now. If resources are becoming 
> tight, test and development resources have to give way for production 
> resources...
> 
> Regards,
> Ulrich

Utilization attributes are a good idea, if overloading the machine is
your concern.

If the DBs simply can't run together, add a colocation constraint with a
negative score.

In either case, you can use the "priority" meta-attribute to say which
one is preferred when both can't run:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_resource_meta_attributes

>>
>> Thanks in advance! Happy clustering :)
>>
>> -- 
>> Kindest Regards,
>> Chad Cravens
>> (843) 291-8340
>>
>> [image: http://www.ossys.com] 
>> [image: http://www.linkedin.com/company/open-source-systems-llc]
>>    [image:
>> https://www.facebook.com/OpenSrcSys] 
>>[image: https://twitter.com/OpenSrcSys] 
>>  [image: http://www.youtube.com/OpenSrcSys]
>>    [image: http://www.ossys.com/feed]
>>    [image: cont...@ossys.com] 
>> Chad Cravens
>> (843) 291-8340
>> chad.crav...@ossys.com 
>> http://www.ossys.com 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Using "mandatory" startup order but avoiding depending clones from restart after member of parent clone fails

2017-02-08 Thread Ken Gaillot
On 02/06/2017 05:25 PM, Alejandro Comisario wrote:
> guys, really happy to post my first doubt.
> 
> i'm kinda having an "conceptual" issue that's bringing me, lots of issues
> i need to ensure that order of starting resources are mandatory but
> that is causing me a huge issue, that is if just one of the members of
> a clone goes down and up (but not all members) all resources depending
> on it are restarted (wich is bad), my workaround is to set order as
> advisory, but that doesnt asure strict order startup.
> 
> eg. clone_b runs on servers_B, and depends on clone_a that runs on servers_A.
> 
> I'll put an example on how i have everything defined between this two clones.
> 
> ### clone_A running on servers A (location rule)
> primitive p_mysql mysql-wss \
> op monitor timeout=55 interval=60 enabled=true on-fail=restart \
> op start timeout=475 interval=0 on-fail=restart \
> op stop timeout=175 interval=0 \
> params socket="/var/run/mysqld/mysqld.sock"
> pid="/var/run/mysqld/mysqld.pid" test_passwd="XXX" test_user=root \
> meta is-managed=true
> 
> clone p_mysql-clone p_mysql \
> meta target-role=Started interleave=false globally-unique=false
> 
> location mysql_location p_mysql-clone resource-discovery=never \
> rule -inf: galera ne 1
> 
> ### clone_B running on servers B (location rule)
> primitive p_keystone apache \
> params configfile="/etc/apache2/apache2.conf" \
> op monitor on-fail=restart interval=60s timeout=60s \
> op start on-fail=restart interval=0 \
> meta target-role=Started migration-threshold=2 failure-timeout=60s
> resource-stickiness=300
> 
> clone p_keystone-clone p_keystone \
> meta target-role=Started interleave=false globally-unique=false
> 
> location keystone_location p_keystone-clone resource-discovery=never \
> rule -inf: keystone ne 1
> 
> order p_clone-mysql-before-p_keystone INF: p_mysql-clone 
> p_keystone-clone:start
> 
> Again just to make my point, if p_mysql-clone looses even one member
> of the clone, ONLY when that member gets back, all members of
> p_keystone-clone gets restarted, and thats NOT what i need, so if i
> change the order from mandatory to advisory, i get what i want
> regarding behaviour of what happens when instances of the clone comes
> and goes, but i loos the strictness of the startup order, which is
> critial for me.
> 
> How can i fix this problem ?
> .. can i ?

I don't think pacemaker can model your desired situation currently.

In OpenStack configs that I'm familiar with, the mysql server (usually
galera) is a master-slave clone, and the constraint used is "promote
mysql then start keystone". That way, if a slave goes away and comes
back, it has no effect.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Pacemaker kill does not cause node fault ???

2017-02-06 Thread Ken Gaillot
On 02/06/2017 03:28 AM, Ulrich Windl wrote:
>>>> RaSca <ra...@miamammausalinux.org> schrieb am 03.02.2017 um 14:00 in
> Nachricht
> <0de64981-904f-5bdb-c98f-9c59ee47b...@miamammausalinux.org>:
> 
>> On 03/02/2017 11:06, Ferenc Wágner wrote:
>>> Ken Gaillot <kgail...@redhat.com> writes:
>>>
>>>> On 01/10/2017 04:24 AM, Stefan Schloesser wrote:
>>>>
>>>>> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup
>>>>> seems to be working ok including the STONITH.
>>>>> For test purposes I issued a "pkill -f pace" killing all pacemaker
>>>>> processes on one node.
>>>>>
>>>>> Result:
>>>>> The node is marked as "pending", all resources stay on it. If I
>>>>> manually kill a resource it is not noticed. On the other node a drbd
>>>>> "promote" command fails (drbd is still running as master on the first
>>>>> node).
>>>>
>>>> I suspect that, when you kill pacemakerd, systemd respawns it quickly
>>>> enough that fencing is unnecessary. Try "pkill -f pace; systemd stop
>>>> pacemaker".
>>>
>>> What exactly is "quickly enough"?
>>
>> What Ken is saying is that Pacemaker, as a service managed by systemd,
>> have in its service definition file
>> (/usr/lib/systemd/system/pacemaker.service) this option:
>>
>> Restart=on-failure
>>
>> Looking at [1] it is explained: systemd restarts immediately the process
>> if it ends for some unexpected reason (like a forced kill).
> 
> Isn't the question: Is crmd a process that is expected to die (and thus need
> restarting)? Or wouldn't one prefer to debug this situation. I fear that
> restarting it might just cover some fatal failure...

If crmd or corosync dies, the node will be fenced (if fencing is enabled
and working). If one of the crmd's persistent connections (such as to
the cib) fails, it will exit, so it ends up the same. But the other
daemons (such as pacemakerd or attrd) can die and respawn without any
risk to services.

The failure will be logged, but it will not be reported in cluster
status, so there is a chance of not noticing it.

> 
>>
>> [1] https://www.freedesktop.org/software/systemd/man/systemd.service.html 
>>
>> -- 
>> RaSca
>> ra...@miamammausalinux.org 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action.

2017-02-06 Thread Ken Gaillot
On 02/06/2017 09:00 AM, Scott Greenlese wrote:
> Further explanation for my concern about --disabled not taking effect
> until after the iface-bridge was configured ...
> 
> The reason I wanted to create the iface-bridge resource "disabled", was
> to allow me the opportunity to impose
> a location constraint / rule on the resource to prevent it from being
> started on certain cluster nodes,
> where the specified slave vlan did not exist.
> 
> In my case, pacemaker assigned the resource to a cluster node where the
> specified slave vlan did not exist, which in turn
> triggered a fenced (off) action against that node (apparently, because
> the device could not be stopped, per Ken's reply earlier).
> 
> Again, my cluster is configured as "symmetric" , so I would have to "opt
> out" my new resource from
> certain cluster nodes via location constraint.
> 
> So, if this really is how --disable is designed to work, is there any
> way to impose a location constraint rule BEFORE
> the iface-bridge resource gets assigned. configured and started on a
> cluster node in a symmetrical cluster?

I would expect --disabled to behave like that already; I'm not sure
what's happening there.

But, you can add a resource and any constraints that apply to it
simultaneously. How to do this depends on whether you want to do it
interactively or scripted, and whether you prefer the low-level tools,
crm shell, or pcs.

If you want to script it via pcs, you can do pcs cluster cib $SOME_FILE,
then pcs -f $SOME_FILE , then pcs cluster
cib-push $SOME_FILE --config.

> 
> Thanks,
> 
> Scott Greenlese ... IBM KVM on System Z - Solutions Test, Poughkeepsie, N.Y.
> INTERNET: swgre...@us.ibm.com
> 
> 
> 
> Inactive hide details for Scott Greenlese---02/03/2017 03:23:40
> PM---Ken, Thanks for the explanation.Scott Greenlese---02/03/2017
> 03:23:40 PM---Ken, Thanks for the explanation.
> 
> From: Scott Greenlese/Poughkeepsie/IBM@IBMUS
> To: kgail...@redhat.com, Cluster Labs - All topics related to
> open-source clustering welcomed <users@clusterlabs.org>
> Date: 02/03/2017 03:23 PM
> Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource
> causes cluster node fence action.
> 
> 
> 
> 
> 
> Ken,
> 
> Thanks for the explanation.
> 
> One other thing, relating to the iface-bridge resource creation. I
> specified --disabled flag:
> 
>> [root@zs95kj VD]# date;pcs resource create br0_r1
>> ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op
>> monitor timeout="20s" interval="10s" --*disabled*
> 
> Does the bridge device have to be successfully configured by pacemaker
> before disabling the resource? It seems
> that that was the behavior, since it failed the resource and fenced the
> node instead of disabling the resource.
> Just checking with you to be sure.
> 
> Thanks again..
> 
> Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y.
> INTERNET: swgre...@us.ibm.com
> 
> 
> 
> Inactive hide details for Ken Gaillot ---02/02/2017 03:29:12 PM---On
> 02/02/2017 02:14 PM, Scott Greenlese wrote: > Hi folks,Ken Gaillot
> ---02/02/2017 03:29:12 PM---On 02/02/2017 02:14 PM, Scott Greenlese
> wrote: > Hi folks,
> 
> From: Ken Gaillot <kgail...@redhat.com>
> To: users@clusterlabs.org
> Date: 02/02/2017 03:29 PM
> Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource
> causes cluster node fence action.
> 
> 
> 
> 
> On 02/02/2017 02:14 PM, Scott Greenlese wrote:
>> Hi folks,
>>
>> I'm testing iface-bridge resource support on a Linux KVM on System Z
>> pacemaker cluster.
>>
>> pacemaker-1.1.13-10.el7_2.ibm.1.s390x
>> corosync-2.3.4-7.el7_2.ibm.1.s390x
>>
>> I created an iface-bridge resource, but specified a non-existent
>> bridge_slaves value, vlan1292 (i.e. vlan1292 doesn't exist).
>>
>> [root@zs95kj VD]# date;pcs resource create br0_r1
>> ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op
>> monitor timeout="20s" interval="10s" --disabled
>> Wed Feb 1 17:49:16 EST 2017
>> [root@zs95kj VD]#
>>
>> [root@zs95kj VD]# pcs resource show |grep br0
>> br0_r1 (ocf::heartbeat:iface-bridge): FAILED zs93kjpcs1
>> [root@zs95kj VD]#
>>
>> As you can see, the resource was created, but failed to start on the
>> target node zs93kppcs1.
>>
>> To my surprise, the target node zs93kppcs1 was unceremoniously fenced.
>>
>> pacemaker.log shows a fenc

Re: [ClusterLabs] [Question] About a change of crm_failcount.

2017-02-03 Thread Ken Gaillot
On 02/02/2017 12:33 PM, Ken Gaillot wrote:
> On 02/02/2017 12:23 PM, renayama19661...@ybb.ne.jp wrote:
>> Hi All,
>>
>> By the next correction, the user was not able to set a value except zero in 
>> crm_failcount.
>>
>>  - [Fix: tools: implement crm_failcount command-line options correctly]
>>- 
>> https://github.com/ClusterLabs/pacemaker/commit/95db10602e8f646eefed335414e40a994498cafd#diff-6e58482648938fd488a920b9902daac4
>>
>> However, pgsql RA sets INFINITY in a script.
>>
>> ```
>> (snip)
>> CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount"
>> (snip)
>> ocf_exit_reason "My data is newer than new master's one. New   master's 
>> location : $master_baseline"
>> exec_with_retry 0 $CRM_FAILCOUNT -r $OCF_RESOURCE_INSTANCE -U $NODENAME 
>> -v INFINITY
>> return $OCF_ERR_GENERIC
>> (snip)
>> ```
>>
>> There seems to be the influence only in pgsql somehow or other.
>>
>> Can you revise it to set a value except zero in crm_failcount?
>> We make modifications to use crm_attribute in pgsql RA if we cannot revise 
>> it.
>>
>> Best Regards,
>> Hideo Yamauchi.
> 
> Hmm, I didn't realize that was used. I changed it because it's not a
> good idea to set fail-count without also changing last-failure and
> having a failed op in the LRM history. I'll have to think about what the
> best alternative is.

Having a resource agent modify its own fail count is not a good idea,
and could lead to unpredictable behavior. I didn't realize the pgsql
agent did that.

I don't want to re-enable the functionality, because I don't want to
encourage more agents doing this.

There are two alternatives the pgsql agent can choose from:

1. Return a "hard" error such as OCF_ERR_ARGS or OCF_ERR_PERM. When
Pacemaker gets one of these errors from an agent, it will ban the
resource from that node (until the failure is cleared).

2. Use crm_resource --ban instead. This would ban the resource from that
node until the user removes the ban with crm_resource --clear (or by
deleting the ban consraint from the configuration).

I'd recommend #1 since it does not require any pacemaker-specific tools.

We can make sure resource-agents has a fix for this before we release a
new version of Pacemaker. We'll have to publicize as much as possible to
pgsql users that they should upgrade resource-agents before or at the
same time as pacemaker. I see the alternative PAF agent has the same
usage, so it will need to be updated, too.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Question] About log collection of crm_report.

2017-01-23 Thread Ken Gaillot
On 01/23/2017 04:17 PM, renayama19661...@ybb.ne.jp wrote:
> Hi All,
> 
> When I carry out Pacemaker1.1.15 and Pacemaker1.1.16 in RHEL7.3, log in 
> conjunction with pacemaker is not collected in the file which I collected in 
> sosreport.
>  
> 
> This seems to be caused by the next correction and pacemaker.py script of 
> RHEL7.3.
> 
>  - 
> https://github.com/ClusterLabs/pacemaker/commit/1bcad6a1eced1a3b6c314b05ac1d353adda260f6
>  - 
> https://github.com/ClusterLabs/pacemaker/commit/582e886dd8475f701746999c0093cd9735aca1ed#diff-284d516fab648676f5d93bc5ce8b0fbf
> 
> 
> ---
> (/usr/lib/python2.7/site-packages/sos/plugins/pacemaker.py)
> (snip)
> if not self.get_option("crm_scrub"):
> crm_scrub = ""
> self._log_warn("scrubbing of crm passwords has been disabled:")
> self._log_warn("data collected by crm_report may contain"
>" sensitive values.")
> self.add_cmd_output('crm_report --sos-mode %s -S -d '
> ' --dest %s --from "%s"' %
> (crm_scrub, crm_dest, crm_from),
> chroot=self.tmp_in_sysroot())
> (snip)
> ---
> 
> 
> When a user carries out crm_report in sosreport, what is the reason that set 
> search_logs to 0?
> 
> We think that the one where search_logs works with 1 in sosreport is right.
> 
> 
> Best Regards,
> Hideo Yamauchi.

Hi Hideo,

The --sos-mode option is intended for RHEL integration, so it is only
guaranteed to work with the combination of pacemaker and sosreport
packages delivered with a particular version of RHEL (and its derivatives).

That allows us to make assumptions about what sosreport features are
available. It might be better to detect those features, but we haven't
seen enough usage of sosreport + pacemaker outside RHEL to make that
worth the effort.

In this case, the version of sosreport that will be in RHEL 7.4 will
collect pacemaker.log and corosync.log on its own, so the crm_report in
pacemaker 1.1.16 doesn't need to collect the logs itself.

It might work if you build the latest sosreport:
https://github.com/sosreport/sos

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

2017-01-30 Thread Ken Gaillot
On 01/10/2017 04:24 AM, Stefan Schloesser wrote:
> Hi,
> 
> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup seems 
> to be working ok including the STONITH.
> For test purposes I issued a "pkill -f pace" killing all pacemaker processes 
> on one node.
> 
> Result:
> The node is marked as "pending", all resources stay on it. If I manually kill 
> a resource it is not noticed. On the other node a drbd "promote" command 
> fails (drbd is still running as master on the first node).
> 
> Killing the corosync process works as expected -> STONITH.
> 
> Could someone shed some light on this behavior? 
> 
> Thanks,
> 
> Stefan

I suspect that, when you kill pacemakerd, systemd respawns it quickly
enough that fencing is unnecessary. Try "pkill -f pace; systemd stop
pacemaker".

Did you schedule monitor operations on your resources? If not, pacemaker
will not know if they go down.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Question] About log collection of crm_report.

2017-01-25 Thread Ken Gaillot
On 01/24/2017 04:41 PM, renayama19661...@ybb.ne.jp wrote:
> Hi Ken,
> 
> Thank you for comment.
> 
> For example, our user does not use pacemaker.log and corosync.log.
> 
> Via a syslog, the user makes setting to output all log to /var/log/ha-log.
> 
> -
> (/etc/corosycn/corosync.conf)
> logging {
> syslog_facility: local1
> debug: off
> }
> 
> (/etc/sysconfig/pacemaker)
> PCMK_logfile=none
> PCMK_logfacility=local1
> PCMK_logpriority=info
> PCMK_fail_fast=yes
> 
> (/etc/rsyslog.conf)
> # Log anything (except mail) of level info or higher.
> # Don't log private authentication messages!
> *.info;mail.none;authpriv.none;cron.none;local1.none
> /var/log/messages
> (snip)
> # Save boot messages also to boot.log
> local7.*/var/log/boot.log
> local1.info /var/log/ha-log
> -
> 
> In present crm_report, in the case of the user who output log in a different 
> file, the log is not collected in sosreport.
> 
> Is this not a problem?
> Possibly is all /var/log going to collect it in future in sosreport?
> 
> Of course I know that "/var/log/ha-log" is collected definitely when I carry 
> out crm_report alone.
> I want to know why collection of log of this crm_report was stopped in 
> sosreport.
> 
> For REDHAT, will it be to be enough for collection of sosreport contents?
> If it is such a thing, we can understand.
> 
> - And I test crm_report at the present, but seem to have some problems.
> - I intend to report the problem by Bugzilla again.
> 
> Best Regards,
> Hideo Yamauchi.

Hi Hideo,

You are right, that is a problem. I've opened a Red Hat bug for sosreport:

https://bugzilla.redhat.com/show_bug.cgi?id=1416535

> - Original Message -
>> From: Ken Gaillot <kgail...@redhat.com>
>> To: users@clusterlabs.org
>> Cc: 
>> Date: 2017/1/24, Tue 08:15
>> Subject: Re: [ClusterLabs] [Question] About log collection of crm_report.
>>
>> On 01/23/2017 04:17 PM, renayama19661...@ybb.ne.jp wrote:
>>>  Hi All,
>>>
>>>  When I carry out Pacemaker1.1.15 and Pacemaker1.1.16 in RHEL7.3, log in 
>> conjunction with pacemaker is not collected in the file which I collected in 
>> sosreport.
>>>   
>>>
>>>  This seems to be caused by the next correction and pacemaker.py script of 
>> RHEL7.3.
>>>
>>>   - 
>> https://github.com/ClusterLabs/pacemaker/commit/1bcad6a1eced1a3b6c314b05ac1d353adda260f6
>>>   - 
>> https://github.com/ClusterLabs/pacemaker/commit/582e886dd8475f701746999c0093cd9735aca1ed#diff-284d516fab648676f5d93bc5ce8b0fbf
>>>
>>>
>>>  ---
>>>  (/usr/lib/python2.7/site-packages/sos/plugins/pacemaker.py)
>>>  (snip)
>>>  if not self.get_option("crm_scrub"):
>>>  crm_scrub = ""
>>>  self._log_warn("scrubbing of crm passwords has been 
>> disabled:")
>>>  self._log_warn("data collected by crm_report may 
>> contain"
>>> " sensitive values.")
>>>  self.add_cmd_output('crm_report --sos-mode %s -S -d '
>>>  ' --dest %s --from "%s"' %
>>>  (crm_scrub, crm_dest, crm_from),
>>>  chroot=self.tmp_in_sysroot())
>>>  (snip)
>>>  ---
>>>
>>>
>>>  When a user carries out crm_report in sosreport, what is the reason that 
>> set search_logs to 0?
>>>
>>>  We think that the one where search_logs works with 1 in sosreport is right.
>>>
>>>
>>>  Best Regards,
>>>  Hideo Yamauchi.
>>
>> Hi Hideo,
>>
>> The --sos-mode option is intended for RHEL integration, so it is only
>> guaranteed to work with the combination of pacemaker and sosreport
>> packages delivered with a particular version of RHEL (and its derivatives).
>>
>> That allows us to make assumptions about what sosreport features are
>> available. It might be better to detect those features, but we haven't
>> seen enough usage of sosreport + pacemaker outside RHEL to make that
>> worth the effort.
>>
>> In this case, the version of sosreport that will be in RHEL 7.4 will
>> collect pacemaker.log and corosync.log on its own, so the crm_report in
>> pacemaker 1.1.16 doesn't need to collect the logs itself.
>>
>> It might work if you build the latest sosreport:
>> https://github.com/sosreport/sos
>>

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Need help in setting up HA cluster for applications/services other than Apache tomcat.

2017-02-20 Thread Ken Gaillot
On 02/18/2017 10:55 AM, Chad Cravens wrote:
> Hello Vijay:
> 
> it seems you may want to consider developing custom Resource Agents.
> Take a look at the following guide:
> http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html
> 
> I have created several, it is pretty straightforward and has always
> worked as expected. I would say one of the most important parts of
> creating a custom RA script is to make sure you have a good method for
> determining the state of a resource with monitor()
> 
> Good luck!

Agreed, a custom resource agent is the most flexible approach. More
details in addition to above link:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

If your services already have LSB init scripts or systemd unit files,
Pacemaker can use them directly instead -- just configure the resource
as lsb:myscriptname or systemd:myunitname. Pacemaker will look for those
files in the usual system locations for such things. That doesn't have
as much flexibility as a custom agent, but if you already have them,
it's the easiest approach.

> On Fri, Feb 17, 2017 at 8:22 AM, vijay singh rathore
> >
> wrote:
> 
> Hi Team,
> 
> Good Morning everyone, hope you all are doing great.
> 
> First of all I would like to apologise, if I have created
> inconvenience for team members by sending this mail.  
> 
> I have a question and i have tried almost all possible forums and
> googled a lot before reaching to this group for help.
> 
> I want to create HA cluster for applications/services which are not
> in tomcat or related to Apache or MySQL. Let's say they are written
> in different languages such as java, node js, c++, and deployed in
> certain path i.e. /home/xyz
> 
> How can i add these applications for high availability in HA cluster
> using pcs/pacemaker/corosync.
> 
> If I have to create resource for these applications how to create
> and if i have to use some other way, how can i implement it.
> 
> Requesting you to please provide me some suggestions or reference
> documents or links or anything which can help me in completing this
> task and to test fail over for these applications.
> 
> Thanks a lot in advance, have a great day and time ahead.
> 
> Best Regards
> Vijay Singh
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> 
> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> 
> -- 
> Kindest Regards,
> Chad Cravens
> (843) 291-8340
> 
> http://www.ossys.com 
> http://www.linkedin.com/company/open-source-systems-llc
>   
> https://www.facebook.com/OpenSrcSys
>    https://twitter.com/OpenSrcSys
>   http://www.youtube.com/OpenSrcSys
>    http://www.ossys.com/feed
>    cont...@ossys.com 
> Chad Cravens
> (843) 291-8340
> chad.crav...@ossys.com 
> http://www.ossys.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Never join a list without a problem...

2017-02-24 Thread Ken Gaillot
On 02/24/2017 08:36 AM, Jeffrey Westgate wrote:
> Greetings all.
> 
> I have inherited a pair of Scientific Linux 6 boxes used as front-end load 
> balancers for our DNS cluster. (Yes, I inherited that, too.)
> 
> It was time to update them so we pulled snapshots (they are VMWare VMs, very 
> small, 1 cpu, 2G RAM, 10G disk), did a "yum update -y" watched everything 
> update, then rebooted.  Pacemaker kept the system from booting.
> Reverted to the snapshot, ran a "yum update -y --exclude=pacemaker\* " and 
> everything is hunky-dory.
> 
> # yum list pacemaker\*
> Installed Packages
> pacemaker.x86_64 1.1.10-14.el6
> @sl
> pacemaker-cli.x86_64 1.1.10-14.el6
> @sl
> pacemaker-cluster-libs.x86_641.1.10-14.el6
> @sl
> pacemaker-libs.x86_641.1.10-14.el6
> @sl
> Available Packages
> pacemaker.x86_64 1.1.14-8.el6_8.2 
> sl-security
> pacemaker-cli.x86_64 1.1.14-8.el6_8.2 
> sl-security
> pacemaker-cluster-libs.x86_641.1.14-8.el6_8.2 
> sl-security
> pacemaker-libs.x86_641.1.14-8.el6_8.2 
> sl-security
> 
> I searched clusterlabs.org looking for issues with updates, and came up empty.
> 
> # cat /etc/redhat-release
> Scientific Linux release 6.5 (Carbon)
> 
> ... is there something post-install/pre reboot that I need to do?
> 
> 
> --
> Jeff Westgate
> UNIX/Linux System Administrator
> Arkansas Dept. of Information Systems

Welcome! I joined the list with a problem, too, and now I'm technical
lead for the project, so be prepared ... ;-)

I don't know of any issues that would cause problems in that upgrade,
much less prevent a boot. Try disabling pacemaker at boot, doing the
upgrade, and then starting pacemaker, and pastebin any relevant messages
from /var/log/cluster/corosync.log.

If you're on SL 6, you should be using CMAN as the underlying cluster
layer. If you're using the corosync 1 pacemaker plugin, that's not well
tested on that platform.

Some general tips:

* You can run crm_verify (with either --live-check on a running cluster,
or -x /var/lib/pacemaker/cib/cib.xml on a stopped one) before and after
the upgrade to make sure you don't have any unaddressed configuration
issues.

* You can also run cibadmin --upgrade before and after the upgrade, to
make sure your configuration is using the latest schema.

It shouldn't prevent a boot if they're not done, but that may help
uncover any issues.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] question about equal resource distribution

2017-02-17 Thread Ken Gaillot
On 02/17/2017 08:43 AM, Ilia Sokolinski wrote:
> Thank you!
> 
> What quantity does pacemaker tries to equalize - number of running resources 
> per node or total stickiness per node?
> 
> Suppose I have a bunch of web server groups each with IPaddr and apache 
> resources, and a fewer number of database groups each with IPaddr, postgres 
> and LVM resources.
> 
> In that case, does it mean that 3 web server groups are weighted the same as 
> 2 database groups in terms of distribution?
> 
> Ilia

By default, pacemaker simply chooses the node with the fewest resources
when placing a resource (subject to your constraints, of course).
However you can have much more control if you want:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm139683960632560

> 
> 
> 
>> On Feb 17, 2017, at 2:58 AM, Kristoffer Grönlund  
>> wrote:
>>
>> Ilia Sokolinski  writes:
>>
>>> Suppose I have a N node cluster where N > 2 running m*N resources. 
>>> Resources don’t have preferred nodes, but since resources take RAM and CPU 
>>> it is important to distribute them equally among the nodes.
>>> Will pacemaker do the equal distribution, e.g. m resources per node?
>>> If a node fails, will pacemaker redistribute the resources equally too, 
>>> e.g. m * N/(N-1) per node?
>>>
>>> I don’t see any settings controlling this behavior in the documentation, 
>>> but perhaps, pacemaker tries to be “fair” by default.
>>>
>>
>> Yes, pacemaker tries to allocate resources evenly by default, and will
>> move resources when nodes fail in order to maintain that.
>>
>> There are several different mechanisms that influence this behaviour:
>>
>> * Any placement constraints in general influence where resources are
>>  allocated.
>>
>> * You can set resource-stickiness to a non-zero value which determines
>>  to which degree Pacemaker prefers to leave resources running where
>>  they are. The score is in relation to other placement scores, like
>>  constraint scores etc. This can be set for individual resources or
>>  globally. [1]
>>
>> * If you have an asymmetrical cluster, resources have to be manually
>>  allocated to nodes via constraints, see [2]
>>
>> [1]: 
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-options
>> [2]: 
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_asymmetrical_opt_in_clusters
>>
>> Cheers,
>> Kristoffer
>>
>>> Thanks 
>>>
>>> Ilia Sokolinski

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker kill does not cause node fault ???

2017-02-13 Thread Ken Gaillot
On 02/08/2017 02:49 AM, Ferenc Wágner wrote:
> Ken Gaillot <kgail...@redhat.com> writes:
> 
>> On 02/07/2017 01:11 AM, Ulrich Windl wrote:
>>
>>> Ken Gaillot <kgail...@redhat.com> writes:
>>>
>>>> On 02/06/2017 03:28 AM, Ulrich Windl wrote:
>>>>
>>>>> Isn't the question: Is crmd a process that is expected to die (and
>>>>> thus need restarting)? Or wouldn't one prefer to debug this
>>>>> situation. I fear that restarting it might just cover some fatal
>>>>> failure...
>>>>
>>>> If crmd or corosync dies, the node will be fenced (if fencing is enabled
>>>> and working). If one of the crmd's persistent connections (such as to
>>>> the cib) fails, it will exit, so it ends up the same.
>>>
>>> But isn't it due to crmd not responding to network packets? So if the
>>> timeout is long enough, and crmd is started fast enough, will the
>>> node really be fenced?
>>
>> If crmd dies, it leaves its corosync process group, and I'm pretty sure
>> the other nodes will fence it for that reason, regardless of the duration.
> 
> See http://lists.clusterlabs.org/pipermail/users/2016-March/002415.html
> for a case when a Pacemaker cluster survived a crmd failure and restart.
> Re-reading the thread, I'm still unsure what saved our ass from
> resources being started in parallel and losing massive data.  I'd fully
> expect fencing in such cases...

Looking at that again, crmd leaving the process group isn't enough to be
fenced -- that should abort the transition and update the node state in
the CIB, but it's up to the (new) DC to determine that fencing is needed.

If crmd respawns quickly enough to join the election for the new DC
(which seemed to be the case here), it should just need to be re-probed.



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] two node cluster: vm starting - shutting down 15min later - starting again 15min later ... and so on

2017-02-09 Thread Ken Gaillot
On 02/09/2017 10:48 AM, Lentes, Bernd wrote:
> Hi,
> 
> i have a two node cluster with a vm as a resource. Currently i'm just testing 
> and playing. My vm boots and shuts down again in 15min gaps.
> Surely this is related to "PEngine Recheck Timer (I_PE_CALC) just popped 
> (90ms)" found in the logs. I googled, and it is said that this
> is due to time-based rule 
> (http://oss.clusterlabs.org/pipermail/pacemaker/2009-May/001647.html). OK.
> But i don't have any time-based rules.
> This is the config for my vm:
> 
> primitive prim_vm_mausdb VirtualDomain \
> params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \
> params hypervisor="qemu:///system" \
> params migration_transport=ssh \
> op start interval=0 timeout=90 \
> op stop interval=0 timeout=95 \
> op monitor interval=30 timeout=30 \
> op migrate_from interval=0 timeout=100 \
> op migrate_to interval=0 timeout=120 \
> meta allow-migrate=true \
> meta target-role=Started \
> utilization cpu=2 hv_memory=4099
> 
> The only constraint concerning the vm i had was a location (which i didn't 
> create).

What is the constraint? If its ID starts with "cli-", it was created by
a command-line tool (such as crm_resource, crm shell or pcs, generally
for a "move" or "ban" command).

> Ok, this timer is available, i can set it to zero to disable it.

The timer is used for multiple purposes; I wouldn't recommend disabling
it. Also, this doesn't fix the problem; the problem will still occur
whenever the cluster recalculates, just not on a regular time schedule.

> But why does it influence my vm in such a manner ?
> 
> Excerp from the log:
> 
> ...
> Feb  9 16:19:38 ha-idg-1 VirtualDomain(prim_vm_mausdb)[13148]: INFO: Domain 
> mausdb_vm already stopped.
> Feb  9 16:19:38 ha-idg-1 crmd[8407]:   notice: process_lrm_event: Operation 
> prim_vm_mausdb_stop_0: ok (node=ha-idg-1, call=401, rc=0, cib-update=340, 
> confirmed=true)
> Feb  9 16:19:38 ha-idg-1 kernel: [852506.947196] device vnet0 entered 
> promiscuous mode
> Feb  9 16:19:38 ha-idg-1 kernel: [852507.008770] br0: port 2(vnet0) entering 
> forwarding state
> Feb  9 16:19:38 ha-idg-1 kernel: [852507.008775] br0: port 2(vnet0) entering 
> forwarding state
> Feb  9 16:19:38 ha-idg-1 kernel: [852507.172120] qemu-kvm: sending ioctl 5326 
> to a partition!
> Feb  9 16:19:38 ha-idg-1 kernel: [852507.172133] qemu-kvm: sending ioctl 
> 80200204 to a partition!
> Feb  9 16:19:41 ha-idg-1 crmd[8407]:   notice: process_lrm_event: Operation 
> prim_vm_mausdb_start_0: ok (node=ha-idg-1, call=402, rc=0, cib-update=341, 
> confirmed=true)
> Feb  9 16:19:41 ha-idg-1 crmd[8407]:   notice: process_lrm_event: Operation 
> prim_vm_mausdb_monitor_3: ok (node=ha-idg-1, call=403, rc=0, 
> cib-update=342, confirmed=false)
> Feb  9 16:19:48 ha-idg-1 kernel: [852517.049015] vnet0: no IPv6 routers 
> present
> ...
> Feb  9 16:34:41 ha-idg-1 VirtualDomain(prim_vm_mausdb)[18272]: INFO: Issuing 
> graceful shutdown request for domain mausdb_vm.
> Feb  9 16:35:06 ha-idg-1 kernel: [853434.550089] br0: port 2(vnet0) entering 
> forwarding state
> Feb  9 16:35:06 ha-idg-1 kernel: [853434.550160] device vnet0 left 
> promiscuous mode
> Feb  9 16:35:06 ha-idg-1 kernel: [853434.550165] br0: port 2(vnet0) entering 
> disabled state
> Feb  9 16:35:06 ha-idg-1 ifdown: vnet0
> Feb  9 16:35:06 ha-idg-1 ifdown: Interface not available and no configuration 
> found.
> Feb  9 16:35:07 ha-idg-1 crmd[8407]:   notice: process_lrm_event: Operation 
> prim_vm_mausdb_stop_0: ok (node=ha-idg-1, call=405, rc=0, cib-update=343, 
> confirmed=true)
> ...
> 
> I deleted the location and until that vm is running fine for already 35min.

The logs don't go far back enough to have an idea why the VM was
stopped. Also, logs from the other node might be relevant, if it was the
DC (controller) at the time.

> System is SLES 11 SP4 64bit, vm is SLES 10 SP4 64bit.
> 
> Thanks.
> 
> Bernd

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Q (SLES11 SP4): Delay after node came up (info: throttle_send_command: New throttle mode: 0000 (was ffffffff))

2017-02-09 Thread Ken Gaillot
On 01/16/2017 04:25 AM, Ulrich Windl wrote:
> Hi!
> 
> I have a question: The following happened in out 3-node cluster (n1, n2, n3):
> n3 was DC, n2 was offlined, n2 came online again, n1 rebooted (went 
> offline/online), then n2 reboted (offline /online)
> 
> I observed a significant delay after all three nodes were online before 
> resources were started. Actualy the start seemed to be triggered by some crm 
> restart action on n3.
> 
> Logs on n3 (DC) look like this:
> cib: info: cib_process_request:  Completed cib_modify operation for 
> section status: OK (rc=0, origin=local/crmd/359, version=1.99.1)
> crmd:   notice: handle_request:   Current ping state: S_TRANSITION_ENGINE
> (...many more...)
> stonith-ng: info: plugin_handle_membership: Membership 3328: 
> quorum retained
> crmd: info: plugin_handle_membership: Membership 3328: quorum 
> retained
> [...]
> stonith-ng: info: plugin_handle_membership: Membership 3328: 
> quorum retained
> [...]
> cib: info: cib_process_request:  Completed cib_modify operation for 
> section status: OK (rc=0, origin=local/crmd/365, version=1.99.3)
> crmd: info: crmd_cs_dispatch: Setting expected votes to 3
> crmd: info: plugin_handle_membership: Membership 3328: quorum 
> retained
> [...]
> crmd: info: crmd_cs_dispatch: Setting expected votes to 3
> crmd: info: do_state_transition:  State transition 
> S_TRANSITION_ENGINE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL 
> origin=peer_update_callback ]
> crmd: info: do_dc_join_offer_one: An unknown node joined - (re-)offer 
> to any unconfirmed nodes
> (???what's that?)

This is normal when a node joins the cluster. The DC's cluster layer
detects any joins, and the DC's crmd responds to that by offering
membership to the new node(s).

> crmd: info: join_make_offer:  Making join offers based on membership 3328
> crmd: info: join_make_offer:  join-2: Sending offer to n2
> crmd: info: crm_update_peer_join: join_make_offer: Node n2[739512325] 
> - join-2 phase 0 -> 1
> crmd: info: join_make_offer:  Skipping n1: already known 4
> crmd: info: join_make_offer:  Skipping n3: already known 4

Above we can see that n1 and n3 already have confirmed membership, and
the newly joined n2 gets offered membership.

> crmd:   notice: abort_transition_graph:   Transition aborted: Peer Halt 
> (source=do_te_invoke:168, 0)

This is one of the common log messages I think can be improved. "Peer
Halt" in this case does not mean the peer halted, but rather that the
transition was halted due to a peer event (in this case the join).

> cib: info: cib_process_request:  Completed cib_modify operation for 
> section crm_config: OK (rc=0, origin=local/crmd/375, version=1.99.5)
> crmd: info: crm_update_peer_join: do_dc_join_filter_offer: Node 
> n2[739512325] - join-2 phase 1 -> 2
> crmd: info: crm_update_peer_expected: do_dc_join_filter_offer: 
> Node n2[739512325] - expected state is now member (was down)
> crmd: info: abort_transition_graph:   Transition aborted: Peer Halt 
> (source=do_te_invoke:168, 0)
> crmd: info: do_state_transition:  State transition S_INTEGRATION -> 
> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL 
> origin=check_join_state ]
> crmd: info: crmd_join_phase_log:  join-2: n2=integrated
> crmd: info: crmd_join_phase_log:  join-2: n1=confirmed
> crmd: info: crmd_join_phase_log:  join-2: n3=confirmed
> crmd:   notice: do_dc_join_finalize:  join-2: Syncing the CIB from n2 to 
> the rest of the cluster
> [...]
> cib: info: cib_process_replace:  Replaced 1.99.5 with 1.99.5 from n2
> cib: info: cib_process_request:  Completed cib_replace operation for 
> section 'all': OK (rc=0, origin=n2/crmd/376, version=1.99.5)
> crmd: info: crm_update_peer_join: finalize_join_for: Node 
> n2[739512325] - join-2 phase 2 -> 3
> crmd: info: do_log:   FSA: Input I_WAIT_FOR_EVENT from do_te_invoke() 
> received in state S_FINALIZE_JOIN
> crmd: info: abort_transition_graph:   Transition aborted: Peer Halt 
> (source=do_te_invoke:168, 0)
> [...]
> cib: info: cib_file_write_with_digest:   Wrote version 1.99.0 of the 
> CIB to disk (digest: 6e71ae6f4a1d2619cc64c91d40f55a32)
> (??? We already had 1.99.5)

Only .0's are written to disk -- the .x's contain updates to dynamic
information (like the status section) and are in-memory only.

> cib: info: cib_process_request:  Completed cib_modify operation for 
> section status: OK (rc=0, origin=n2/attrd/3, version=1.99.6)
> crmd: info: crm_update_peer_join: do_dc_join_ack: Node n2[739512325] 
> - join-2 phase 3 -> 4
> crmd: info: do_dc_join_ack:   join-2: Updating node state to member for n2
> [...]
> crmd:   notice: handle_request:   Current ping state: S_FINALIZE_JOIN
> crmd: info: do_log:   FSA: Input I_WAIT_FOR_EVENT from 

Re: [ClusterLabs] Problems with corosync and pacemaker with error scenarios

2017-02-09 Thread Ken Gaillot
On 01/16/2017 11:18 AM, Gerhard Wiesinger wrote:
> Hello Ken,
> 
> thank you for the answers.
> 
> On 16.01.2017 16:43, Ken Gaillot wrote:
>> On 01/16/2017 08:56 AM, Gerhard Wiesinger wrote:
>>> Hello,
>>>
>>> I'm new to corosync and pacemaker and I want to setup a nginx cluster
>>> with quorum.
>>>
>>> Requirements:
>>> - 3 Linux maschines
>>> - On 2 maschines floating IP should be handled and nginx as a load
>>> balancing proxy
>>> - 3rd maschine is for quorum only, no services must run there
>>>
>>> Installed on all 3 nodes corosync/pacemaker, firewall ports openend are:
>>> 5404, 5405, 5406 for udp in both directions
>> If you're using firewalld, the easiest configuration is:
>>
>>firewall-cmd --permanent --add-service=high-availability
>>
>> If not, depending on what you're running, you may also want to open  TCP
>> ports 2224 (pcsd), 3121 (Pacemaker Remote), and 21064 (DLM).
> 
> I'm using shorewall on the lb01/lb02 nodes and firewalld on kvm01.
> 
> pcs status
> Cluster name: lbcluster
> Stack: corosync
> Current DC: lb01 (version 1.1.16-1.fc25-94ff4df) - partition with quorum
> Last updated: Mon Jan 16 16:46:52 2017
> Last change: Mon Jan 16 15:07:59 2017 by root via cibadmin on lb01
> 
> 3 nodes configured
> 40 resources configured
> 
> Online: [ kvm01 lb01 lb02 ]
> 
> Full list of resources:
> ...
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: inactive/disabled
> 
> BTW: I'm not running pcsd, as far as I know it is for UI configuration
> only So ports ports 2224 (pcsd), 3121 (Pacemaker Remote), and 21064
> (DLM) are closed. Shouldn't be a problem, right?

pcs uses pcsd for most of its commands, so if you want to use pcs, it
should be enabled and allowed between nodes.

You don't have Pacemaker Remote nodes, so you can leave that port
closed. DLM is only necessary for certain resource types (such as clvmd).

>>> OS: Fedora 25
>>>
>>> Configuration of corosync (only the bindnetaddr is different on every
>>> maschine) and pacemaker below.
>> FYI you don't need a different bindnetaddr. You can (and generally
>> should) use the *network* address, which is the same on all hosts.
> 
> Only lb01 and lb02 are on the same network, kvm01 is on a different
> location and network therefore.

I'm not familiar with corosync nodes on the same ring using different
networks, but I suppose it's OK since you're using udpu, with ring0_addr
specified for each node.

>>> Configuration works so far but error test scenarios don't work like
>>> expected:
>>> 1.) I had cases in testing without qourum and quorum again where the
>>> cluster kept in Stopped state
>>>I had to restart the whole stack to get it online again (killall -9
>>> corosync;systemctl restart corosync;systemctl restart pacemaker)
>>>Any ideas?
>> It will be next to impossible to say without logs. It's definitely not
>> expected behavior. Stopping is the correct response to losing quorum;
>> perhaps quorum is not being properly restored for some reason. What is
>> your test methodology?
> 
> I had it when I rebooted just one node.
> 
> Testing scenarios are:
> *) Rebooting
> *) Starting/stopping corosync
> *) network down simulation on lb01/lb02
> *) putting an interface down with ifconfig eth1:1 down (simulation of
> loosing an IP address)
> *) see also below
> 
> Tested now again with all nodes up (I've configured 13 ip adresses for
> the sake of getting a faster overview I posted only the config for 2 ip
> adresses):
> No automatic recovery happens.
> e.g. ifconfig eth1:1 down
>  Resource Group: ClusterNetworking
>  ClusterIP_01   (ocf::heartbeat:IPaddr2):   FAILED lb02
>  ClusterIPRoute_01  (ocf::heartbeat:Route): FAILED lb02
>  ClusterIPRule_01   (ocf::heartbeat:Iprule):Started lb02
>  ClusterIP_02   (ocf::heartbeat:IPaddr2):   FAILED lb02
>  ClusterIPRoute_02  (ocf::heartbeat:Route): FAILED lb02 (blocked)
>  ClusterIPRule_02   (ocf::heartbeat:Iprule):Stopped
>  ClusterIP_03   (ocf::heartbeat:IPaddr2):   Stopped
>  ClusterIPRoute_03  (ocf::heartbeat:Route): Stopped
>  ClusterIPRule_03   (ocf::heartbeat:Iprule):Stopped
> ...
>  ClusterIP_13   (ocf::heartbeat:IPaddr2):   Stopped
>  ClusterIPRoute_13  (ocf::heartbeat:Route): Stopped
>  ClusterIPRule_13   (ocf::heartbeat:Iprule):Stopped
>  webserver  (ocf::heartbeat:nginx): Stopped
> 
> 
> Failed Actions:
> * ClusterIP_01_monitor_1 on lb0

Re: [ClusterLabs] Trouble setting up selfcompiled Apache in a pacemaker cluster on Oracle Linux 6.8

2017-02-09 Thread Ken Gaillot
On 01/16/2017 10:16 AM, Souvignier, Daniel wrote:
> Hi List,
> 
>  
> 
> I’ve got trouble getting Apache to work in a Pacemaker cluster I set up
> between two Oracle Linux 6.8 hosts. The cluster itself works just fine,
> but Apache won’t come up. Thing is here, this Apache is different from a
> basic setup because it is selfcompiled and therefore living in
> /usr/local/apache2. Also it is the latest version available (2.4.25),
> which could also be causing problems. To be able to debug, I went into
> the file /usr/lib/ocf/resources.d/heartbeat/apache and „verbosed“ it by
> simply adding set –x. This way, I can extract the scripts output from
> the logfile /var/log/cluster/corosync.log, which I appended to this
> email (hopefully it won’t get filtered).
> 
>  
> 
> The command I used to invoke the apache script mentioned above was:
> 
> pcs resource create httpd ocf:heartbeat:apache
> configfile=/usr/local/apache2/conf/httpd.conf
> httpd=/usr/local/apache2/bin/httpd
> statusurl=http://localhost/server-status
> envfiles=/usr/local/apache2/bin/envvars op monitor interval=60s
> 
>  
> 
> Before you ask: the paths are correct and mod_status is also configured
> correctly (works fine when starting Apache manually). I should also add
> that the two nodes which form this cluster are virtual (Vmware vSphere)
> hosts and living in the same network (so no firewalling between them,
> there is a dedicated firewall just before the network). I assume that it
> has something to do with the handling of the pid file, but I couldn’t
> seem to fix it until now. I pointed Apache to create the pid file in
> /var/run/httpd.pid, but that didn’t work either. Suggestions on how to
> solv this? Thanks in advance!

Do you have SELinux enabled? If so, check /var/log/audit/audit.log for
denials.

It looks like the output you attached is from the cluster's initial
probe (one-time monitor operation), and not the start operation.

>  
> 
> Kind regards,
> 
> Daniel Souvignier
> 
>  
> 
> P.S.: If you need the parameters I compiled Apache with, I can tell you,
> but I don’t think that it is relevant here.
> 
>  
> 
> --
> 
> Daniel Souvignier
> 
>  
> 
> IT Center
> 
> Gruppe: Linux-basierte Anwendungen
> 
> Abteilung: Systeme und Betrieb
> 
> RWTH Aachen University
> 
> Seffenter Weg 23
> 
> 52074 Aachen
> 
> Tel.: +49 241 80-29267
> 
> souvign...@itc.rwth-aachen.de 
> 
> www.itc.rwth-aachen.de 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker cluster not working after switching from 1.0 to 1.1

2017-02-09 Thread Ken Gaillot
On 01/16/2017 01:16 PM, Rick Kint wrote:
> 
>> Date: Mon, 16 Jan 2017 09:15:44 -0600
>> From: Ken Gaillot <kgail...@redhat.com>
>> To: users@clusterlabs.org
>> Subject: Re: [ClusterLabs] Pacemaker cluster not working after
>> switching from 1.0 to 1.1 (resend as plain text)
>> Message-ID: <f51b9abd-df28-ec7b-6424-3c221a829...@redhat.com>
>> Content-Type: text/plain; charset=utf-8
>>
>> A preliminary question -- what cluster layer are you running?
>>
>> Pacemaker 1.0 worked with heartbeat or corosync 1, while Ubuntu 14.04
>> ships with corosync 2 by default, IIRC. There were major incompatible
>> changes between corosync 1 and 2, so it's important to get that right
>> before looking at pacemaker.
>>
>> A general note, when making such a big jump in the pacemaker version,
>> I'd recommend running "cibadmin --upgrade" both before exporting 
>> the
>> configuration from 1.0, and again after deploying it on 1.1. This will
>> apply any transformations needed in the CIB syntax. Pacemaker will do
>> this on the fly, but doing it manually lets you see any issues early, as
>> well as being more efficient.
> 
> TL;DR
> - Thanks.
> - Cluster mostly works so I don't think it's a corosync issue.
> - Configuration XML is actually created with crm shell.
> - Is there a summary of changes from 1.0 to 1.1?
> 
> 
> Thanks for the quick reply.
> 
> 
> Corosync is v2.3.3. We've already been through the issues getting corosync 
> working. 
> 
> The cluster works in many ways:
> - Pacemaker sees both nodes.
> - Pacemaker starts all the resources.- Pacemaker promotes an instance of the 
> stateful Encryptor resource to Master/active.
> - If the node running the active Encryptor goes down, the standby Encryptor 
> is promoted and the DC changes.
> - Manual failover works (fiddling with the master-score attribute).
> 
> The problem is that a failure in one of the dependencies doesn't cause 
> promotion anymore.
> 
> 
> 
> 
> Thanks for the cibadmin command, I missed that when reading the docs.
> 
> I omitted some detail. I didn't export the XML from the old cluster to the 
> new cluster. We create the configuration with the crm shell, not with XML. 
> The sequence of events is
> 
> 
> - install corosync, pacemaker, etc.- apply local config file changes.
> - start corosync and pacemaker on both nodes in cluster.
> - verify that cluster is formed (crm_mon shows both nodes online, but no 
> resources).
> - create cluster by running script which passes a here document to the crm 
> shell.
> - verify that cluster is formed
> 
> 
> The crm shell version is "1.2.5+hg1034-1ubuntu4". I've checked the XML 
> against the "Pacemaker Configuration Explained" doc and it looks OK to my 
> admittedly non-knowledgeable eye.
> 
> I tried the cibadmin command in hopes that this might tell me something, but 
> it made no changes. "cib_verify --live-check" doesn't complain either.
> I copied the XML from a Pacemaker 1.0.X system to a Pacemaker 1.1.X system 
> and ran "cibadmin --upgrade" on it. Nothing changed there either. 
> 
> 
> 
> Is there a quick summary of changes from 1.0 to 1.1 somewhere? The "Pacemaker 
> 1.1 Configuration Explained" doc has a section entitled "What is new in 1.0" 
> but nothing for 1.1. I wouldn't be surprised if there is something obvious 
> that I'm missing and it would help if I could limit my search space.

No, there's just the change log, which is quite detailed.

There was no defining change from 1.0 to 1.1. Originally, it was planned
that 1.1 would be a "development" branch with new features, and 1.0
would be a "production" branch with bugfixes only. It proved too much
work to maintain two separate branches, so the 1.0 line was ended, and
1.1 became the sole production branch.

> I've done quite a bit of experimentation: changed the syntax of the 
> colocation constraints, added ordering constraints, and fiddled with 
> timeouts. When I was doing the port to Ubuntu, I tested resource agent exit 
> status but I'll go back and check that again. Any other suggestions?
> 
> 
> BTW, I've fixed some issues with the Pacemaker init script running on Ubuntu. 
> Should these go to Clusterlabs or the Debian/Ubuntu maintainer?

It depends on whether they're using the init script provided upstream,
or their own (which I suspect is more likely).

> CONFIGURATION
> 
> 
> Here's the configuration again, hopefully with indentation preserved this 
> time:
> 
> 
> 
>   
> id="cib-bootstrap-options-stonith-enabled"/>
> id=&q

Re: [ClusterLabs] two node cluster: vm starting - shutting down 15min later - starting again 15min later ... and so on

2017-02-10 Thread Ken Gaillot
On 02/10/2017 06:49 AM, Lentes, Bernd wrote:
> 
> 
> - On Feb 10, 2017, at 1:10 AM, Ken Gaillot kgail...@redhat.com wrote:
> 
>> On 02/09/2017 10:48 AM, Lentes, Bernd wrote:
>>> Hi,
>>>
>>> i have a two node cluster with a vm as a resource. Currently i'm just 
>>> testing
>>> and playing. My vm boots and shuts down again in 15min gaps.
>>> Surely this is related to "PEngine Recheck Timer (I_PE_CALC) just popped
>>> (90ms)" found in the logs. I googled, and it is said that this
>>> is due to time-based rule
>>> (http://oss.clusterlabs.org/pipermail/pacemaker/2009-May/001647.html). OK.
>>> But i don't have any time-based rules.
>>> This is the config for my vm:
>>>
>>> primitive prim_vm_mausdb VirtualDomain \
>>> params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \
>>> params hypervisor="qemu:///system" \
>>> params migration_transport=ssh \
>>> op start interval=0 timeout=90 \
>>> op stop interval=0 timeout=95 \
>>> op monitor interval=30 timeout=30 \
>>> op migrate_from interval=0 timeout=100 \
>>> op migrate_to interval=0 timeout=120 \
>>> meta allow-migrate=true \
>>> meta target-role=Started \
>>> utilization cpu=2 hv_memory=4099
>>>
>>> The only constraint concerning the vm i had was a location (which i didn't
>>> create).
>>
>> What is the constraint? If its ID starts with "cli-", it was created by
>> a command-line tool (such as crm_resource, crm shell or pcs, generally
>> for a "move" or "ban" command).
>>
> I deleted the one i mentioned, but now i have two again. I didn't create them.
> Does the crm create constraints itself ?
> 
> location cli-ban-prim_vm_mausdb-on-ha-idg-2 prim_vm_mausdb role=Started -inf: 
> ha-idg-2
> location cli-prefer-prim_vm_mausdb prim_vm_mausdb role=Started inf: ha-idg-2

The command-line tool you use creates them.

If you're using crm_resource, they're created by crm_resource
--move/--ban. If you're using pcs, they're created by pcs resource
move/ban. Etc.

> One location constraint inf, one -inf for the same resource on the same node.
> Isn't that senseless ?

Yes, but that's what you told it to do :-)

The command-line tools move or ban resources by setting constraints to
achieve that effect. Those constraints are permanent until you remove them.

How to clear them again depends on which tool you use ... crm_resource
--clear, pcs resource clear, etc.

> 
> "crm resorce scores" show -inf for that resource on that node:
> native_color: prim_vm_mausdb allocation score on ha-idg-1: 100
> native_color: prim_vm_mausdb allocation score on ha-idg-2: -INFINITY
> 
> Is -inf stronger ?
> Is it true that only the values for "native_color" are notable ?
> 
> A principle question: When i have trouble to start/stop/migrate resources,
> is it senseful to do a "crm resource cleanup" before trying again ?
> (Beneath finding the reason for the trouble).

It's best to figure out what the problem is first, make sure that's
taken care of, then clean up. The cluster might or might not do anything
when you clean up, depending on what stickiness you have, your failure
handling settings, etc.

> Sorry for asking basic stuff. I read a lot before, but in practise it's total 
> different.
> Although i just have a vm as a resource, and i'm only testing, i'm sometimes 
> astonished about the 
> complexity of a simple two node cluster: scores, failcounts, constraints, 
> default values for a lot of variables ...
> you have to keep an eye on a lot of stuff.
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] 答复: Re: clone resource not get restarted on fail

2017-02-14 Thread Ken Gaillot
On 02/13/2017 07:08 PM, he.hailo...@zte.com.cn wrote:
> Hi,
> 
> 
> > crm configure show
> 
> + crm configure show
> 
> node $id="336855579" paas-controller-1
> 
> node $id="336855580" paas-controller-2
> 
> node $id="336855581" paas-controller-3
> 
> primitive apigateway ocf:heartbeat:apigateway \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive apigateway_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="20.20.2.7" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> primitive router ocf:heartbeat:router \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive router_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="10.10.1.7" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> primitive sdclient ocf:heartbeat:sdclient \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive sdclient_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="10.10.1.8" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> clone apigateway_rep apigateway
> 
> clone router_rep router
> 
> clone sdclient_rep sdclient
> 
> location apigateway_loc apigateway_vip \
> 
> rule $id="apigateway_loc-rule" +inf: apigateway_workable eq 1
> 
> location router_loc router_vip \
> 
> rule $id="router_loc-rule" +inf: router_workable eq 1
> 
> location sdclient_loc sdclient_vip \
> 
> rule $id="sdclient_loc-rule" +inf: sdclient_workable eq 1
> 
> property $id="cib-bootstrap-options" \
> 
> dc-version="1.1.10-42f2063" \
> 
> cluster-infrastructure="corosync" \
> 
> stonith-enabled="false" \
> 
> no-quorum-policy="ignore" \
> 
> start-failure-is-fatal="false" \
> 
> last-lrm-refresh="1486981647"
> 
> op_defaults $id="op_defaults-options" \
> 
> on-fail="restart"
> 
> -
> 
> 
> and B.T.W, I am using "crm_attribute -N $HOSTNAME -q -l reboot --name
> <prefix>_workable -v <1 or 0>" in the monitor to update the
> transient attributes, which control the vip location.

Is there a reason not to use a colocation constraint instead? If X_vip
is colocated with X, it will be moved if X fails.

I don't see any reason in your configuration why the services wouldn't
be restarted. It's possible the cluster tried to restart the service,
but the stop action failed. Since you have stonith disabled, the cluster
can't recover from a failed stop action.

Is there a reason you disabled quorum? With 3 nodes, if they get split
into groups of 1 node and 2 nodes, quorum is what keeps the groups from
both starting all resources.

> and also found, the vip resource won't get moved if the related clone
> resource failed to restart.
> 
> 
> 原始邮件
> *发件人:*<kgail...@redhat.com>;
> *收件人:*<users@clusterlabs.org>;
> *日 期 :*2017年02月13日 23:04
> *主 题 :**Re: [ClusterLabs] clone resource not get restarted on fail*
> 
> 
> On 02/13/2017 07:57 AM, he.hailo...@zte.com.cn wrote:
> > Pacemaker 1.1.10
> > 
> > Corosync 2.3.3
> > 
> > 
> > this is a 3 nodes cluster configured with 3 clone resources, each
> > attached wih a vip resource of IPAddr2:
> > 
> > 
> > >crm status
> > 
> > 
> > Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> > 
> > 
> >  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
> > 
> >  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 
> > 
> >  apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
> > 
> >  Clone Set: sdclient_rep [sdclient]
> > 
> >  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> > 
> >  Clone Set: router_rep [router]
> > 
> >  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> > 
> >  Clone Set: apigateway_rep [apigateway]
> > 
> >  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> > 
> > 
> > It is observed that sometimes the clone resource is stuck to monitor
> > when the service fails:
> > 
> > 
> >  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
> > 
> >  sdclient_vip   (ocf::heartbeat:IPaddr2):   

Re: [ClusterLabs] clone resource not get restarted on fail

2017-02-13 Thread Ken Gaillot
On 02/13/2017 07:57 AM, he.hailo...@zte.com.cn wrote:
> Pacemaker 1.1.10
> 
> Corosync 2.3.3
> 
> 
> this is a 3 nodes cluster configured with 3 clone resources, each
> attached wih a vip resource of IPAddr2:
> 
> 
> >crm status
> 
> 
> Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> 
> 
>  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
> 
>  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 
> 
>  apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
> 
>  Clone Set: sdclient_rep [sdclient]
> 
>  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> 
>  Clone Set: router_rep [router]
> 
>  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> 
>  Clone Set: apigateway_rep [apigateway]
> 
>  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> 
> 
> It is observed that sometimes the clone resource is stuck to monitor
> when the service fails:
> 
> 
>  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
> 
>  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
> 
>  apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-3 
> 
>  Clone Set: sdclient_rep [sdclient]
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
>  Stopped: [ paas-controller-3 ]
> 
>  Clone Set: router_rep [router]
> 
>  router (ocf::heartbeat:router):Started
> paas-controller-3 FAILED 
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
>  Clone Set: apigateway_rep [apigateway]
> 
>  apigateway (ocf::heartbeat:apigateway):Started
> paas-controller-3 FAILED 
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
> 
> in the example above. the sdclient_rep get restarted on node 3, while
> the other two hang at monitoring on node 3, here are the ocf logs:
> 
> 
> abnormal (apigateway_rep):
> 
> 2017-02-13 18:27:53 [23586]===print_log test_monitor run_func main===
> Starting health check.
> 
> 2017-02-13 18:27:53 [23586]===print_log test_monitor run_func main===
> health check succeed.
> 
> 2017-02-13 18:27:55 [24010]===print_log test_monitor run_func main===
> Starting health check.
> 
> 2017-02-13 18:27:55 [24010]===print_log test_monitor run_func main===
> Failed: docker daemon is not running.
> 
> 2017-02-13 18:27:57 [24095]===print_log test_monitor run_func main===
> Starting health check.
> 
> 2017-02-13 18:27:57 [24095]===print_log test_monitor run_func main===
> Failed: docker daemon is not running.
> 
> 2017-02-13 18:27:59 [24159]===print_log test_monitor run_func main===
> Starting health check.
> 
> 2017-02-13 18:27:59 [24159]===print_log test_monitor run_func main===
> Failed: docker daemon is not running.
> 
> 
> normal (sdclient_rep):
> 
> 2017-02-13 18:27:52 [23507]===print_log sdclient_monitor run_func
> main=== health check succeed.
> 
> 2017-02-13 18:27:54 [23630]===print_log sdclient_monitor run_func
> main=== Starting health check.
> 
> 2017-02-13 18:27:54 [23630]===print_log sdclient_monitor run_func
> main=== Failed: docker daemon is not running.
> 
> 2017-02-13 18:27:55 [23710]===print_log sdclient_stop run_func main===
> Starting stop the container.
> 
> 2017-02-13 18:27:55 [23710]===print_log sdclient_stop run_func main===
> docker daemon lost, pretend stop succeed.
> 
> 2017-02-13 18:27:55 [23763]===print_log sdclient_start run_func main===
> Starting run the container.
> 
> 2017-02-13 18:27:55 [23763]===print_log sdclient_start run_func main===
> docker daemon lost, try again in 5 secs.
> 
> 2017-02-13 18:28:00 [23763]===print_log sdclient_start run_func main===
> docker daemon lost, try again in 5 secs.
> 
> 2017-02-13 18:28:05 [23763]===print_log sdclient_start run_func main===
> docker daemon lost, try again in 5 secs.
> 
> 
> If I disable 2 clone resource, the switch over test for one clone
> resource works as expected: fail the service -> monitor fails -> stop
> -> start
> 
> 
> Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> 
> 
>  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
> 
>  Clone Set: sdclient_rep [sdclient]
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
>  Stopped: [ paas-controller-3 ]
> 
> 
> what's the reason behind 

Can you show the configuration of the three clones, their operations,
and any constraints?

Normally, the response is controlled by the monitor operation's on-fail
attribute (which defaults to restart).


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: [Question] About a change of crm_failcount.

2017-02-09 Thread Ken Gaillot
On 02/09/2017 05:46 AM, Jehan-Guillaume de Rorthais wrote:
> On Thu, 9 Feb 2017 19:24:22 +0900 (JST)
> renayama19661...@ybb.ne.jp wrote:
> 
>> Hi Ken,
>>
>>
>>> 1. Return a "hard" error such as OCF_ERR_ARGS or OCF_ERR_PERM. When
>>> Pacemaker gets one of these errors from an agent, it will ban the
>>> resource from that node (until the failure is cleared).  
>>
>> The first suggestion does not work well.
>>
>> Even if this returns OCF_ERR_ARGS and OCF_ERR_PERM, it seems to be to be
>> pre_promote(notify) handling of RA. Pacemaker does not record the notify(pre
>> promote) error in CIB.
>>
>>  * https://github.com/ClusterLabs/pacemaker/blob/master/crmd/lrm.c#L2411
>>
>> Because it is not recorded in CIB, there cannot be the thing that pengine
>> works as "hard error".

Ah, I didn't think of that.

> Indeed. That's why PAF use private attribute to give informations between
> actions. We detect the failure during the notify as well, but raise the error
> during the promotion itself. See how I dealt with this in PAF:
> 
> https://github.com/ioguix/PAF/commit/6123025ff7cd9929b56c9af2faaefdf392886e68

That's a nice use of private attributes.

> As private attributes does not work on older stacks, you could rely on local
> temp file as well in $HA_RSCTMP.
> 
>>> 2. Use crm_resource --ban instead. This would ban the resource from that
>>> node until the user removes the ban with crm_resource --clear (or by
>>> deleting the ban consraint from the configuration).  
>>
>> The second suggestion works well.
>> I intend to adopt the second suggestion.
>>
>> As other methods, you think crm_resource -F to be available, but what do you
>> think? I think that last-failure does not have a problem either to let you
>> handle pseudotrouble if it is crm_resource -F.
>>
>> I think whether crm_resource -F is available, but adopt crm_resource -B
>> because RA wants to completely stop pgsql resource.
>>
>> ``` @pgsql RA
>>
>> pgsql_pre_promote() {
>> (snip)
>> if [ "$cmp_location" != "$my_master_baseline" ]; then
>> ocf_exit_reason "My data is newer than new master's one. New
>> master's location : $master_baseline" exec_with_retry 0 $CRM_RESOURCE -B -r
>> $OCF_RESOURCE_INSTANCE -N $NODENAME -Q return $OCF_ERR_GENERIC
>> fi
>> (snip)
>> CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount"
>> CRM_RESOURCE="${HA_SBIN_DIR}/crm_resource"
>> ```
>>
>> I test movement a little more and send a patch.
> 
> I suppose crm_resource -F will just raise the failcount, break the current
> transition and the CRM will recompute another transition paying attention to
> your "failed" resource (will it try to recover it? retry the previous
> transition again?).
> 
> I would bet on crm_resource -B.

Correct, crm_resource -F only simulates OCF_ERR_GENERIC, which is a soft
error. It might be a nice extension to be able to specify the error
code, but in this case, I think crm_resource -B (or the private
attribute approach, if you're OK with limiting it to corosync 2 and
pacemaker 1.1.13+) is better.

>> - Original Message -
>>> From: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>
>>> To: users@clusterlabs.org; kgail...@redhat.com
>>> Cc: 
>>> Date: 2017/2/6, Mon 17:44
>>> Subject: [ClusterLabs] Antw: Re: [Question] About a change of crm_failcount.
>>>   
>>>>>>  Ken Gaillot <kgail...@redhat.com> schrieb am 02.02.2017 um   
>>> 19:33 in Nachricht
>>> <91a83571-9930-94fd-e635-962830671...@redhat.com>:  
>>>>  On 02/02/2017 12:23 PM, renayama19661...@ybb.ne.jp wrote:  
>>>>>  Hi All,
>>>>>
>>>>>  By the next correction, the user was not able to set a value except   
>>> zero in   
>>>>  crm_failcount.  
>>>>>
>>>>>   - [Fix: tools: implement crm_failcount command-line options correctly]
>>>>> -   
>>>>   
>>> https://github.com/ClusterLabs/pacemaker/commit/95db10602e8f646eefed335414e40
>>>
>>>>  a994498cafd#diff-6e58482648938fd488a920b9902daac4  
>>>>>
>>>>>  However, pgsql RA sets INFINITY in a script.
>>>>>
>>>>>  ```
>>>>>  (snip)
>>>>>  CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount"
>>>>>  (snip)
>>>>>  ocf_exit_reason "My data is newer than new master's one.  

Re: [ClusterLabs] Failed reload

2017-02-09 Thread Ken Gaillot
On 02/08/2017 02:15 AM, Ferenc Wágner wrote:
> Hi,
> 
> There was an interesting discussion on this list about "Doing reload
> right" last July (which I still haven't digested entirely).  Now I've
> got a related question about the current and intented behavior: what
> happens if a reload operation fails?  I found some suggestions in
> http://ocf.community.tummy.narkive.com/RngPlNfz/adding-reload-to-the-ocf-specification,
> from 11 years back, and the question wasn't clear cut at all.  Now I'm
> contemplating adding best-effort reloads to an RA, but not sure what
> behavior I can expect and depend on in the long run.  I'd be grateful
> for your insights.

Seeing this, it occurs to me that commit messages should have links to
relevant mailing list threads, it makes the reason for various choices
so much clearer :-)

As with any operation, a reload failure should be handled according to
its exit code and its on-fail attribute. (Well, any operation except
notify, as became apparent in another recent thread.)

By default, that means a restart (stop then start).

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Using "mandatory" startup order but avoiding depending clones from restart after member of parent clone fails

2017-02-09 Thread Ken Gaillot
On 02/09/2017 08:50 AM, Alejandro Comisario wrote:
> Ken, thanks for yor reply.
> 
> Since in our setup, we use active/active mysql clone, so i think that
> order is the only way to ensure what i want.
> So, simple question, making order "Advisory", and tiking into
> consideration that "maybe" keystone starts before mysql, making it fail
> because of database connecrion.
> 
> If i set on the keystone clone (and all the dependant clones)
> on-fail="restart" for start and monitor actions (of course setting the
> cib option start-failure-is-fatal=false ) to make sure that if it fails,
> it will restart till everything is ok.
> 
> would that make sense to "workaround" that ?
> 
> best.

Yes, that would work. The default is to fail up to 1,000,000 times, then
it will stop retrying on that node. Of course, you can clean up the
failure to start over (or set a failure-timeout to do that automatically).

> On Thu, Feb 9, 2017 at 12:18 AM, Ken Gaillot <kgail...@redhat.com
> <mailto:kgail...@redhat.com>> wrote:
> 
> On 02/06/2017 05:25 PM, Alejandro Comisario wrote:
> > guys, really happy to post my first doubt.
> >
> > i'm kinda having an "conceptual" issue that's bringing me, lots of
> issues
> > i need to ensure that order of starting resources are mandatory but
> > that is causing me a huge issue, that is if just one of the members of
> > a clone goes down and up (but not all members) all resources depending
> > on it are restarted (wich is bad), my workaround is to set order as
> > advisory, but that doesnt asure strict order startup.
> >
> > eg. clone_b runs on servers_B, and depends on clone_a that runs on
> servers_A.
> >
> > I'll put an example on how i have everything defined between this
> two clones.
> >
> > ### clone_A running on servers A (location rule)
> > primitive p_mysql mysql-wss \
> > op monitor timeout=55 interval=60 enabled=true on-fail=restart \
> > op start timeout=475 interval=0 on-fail=restart \
> > op stop timeout=175 interval=0 \
> > params socket="/var/run/mysqld/mysqld.sock"
> > pid="/var/run/mysqld/mysqld.pid" test_passwd="XXX" test_user=root \
> > meta is-managed=true
> >
> > clone p_mysql-clone p_mysql \
> > meta target-role=Started interleave=false globally-unique=false
> >
> > location mysql_location p_mysql-clone resource-discovery=never \
> > rule -inf: galera ne 1
> >
> > ### clone_B running on servers B (location rule)
> > primitive p_keystone apache \
> > params configfile="/etc/apache2/apache2.conf" \
> > op monitor on-fail=restart interval=60s timeout=60s \
> > op start on-fail=restart interval=0 \
> > meta target-role=Started migration-threshold=2 failure-timeout=60s
> > resource-stickiness=300
> >
> > clone p_keystone-clone p_keystone \
> > meta target-role=Started interleave=false globally-unique=false
> >
> > location keystone_location p_keystone-clone resource-discovery=never \
> > rule -inf: keystone ne 1
> >
> > order p_clone-mysql-before-p_keystone INF: p_mysql-clone
> p_keystone-clone:start
> >
> > Again just to make my point, if p_mysql-clone looses even one member
> > of the clone, ONLY when that member gets back, all members of
> > p_keystone-clone gets restarted, and thats NOT what i need, so if i
> > change the order from mandatory to advisory, i get what i want
> > regarding behaviour of what happens when instances of the clone comes
> > and goes, but i loos the strictness of the startup order, which is
> > critial for me.
> >
> > How can i fix this problem ?
> > .. can i ?
> 
> I don't think pacemaker can model your desired situation currently.
> 
> In OpenStack configs that I'm familiar with, the mysql server (usually
> galera) is a master-slave clone, and the constraint used is "promote
> mysql then start keystone". That way, if a slave goes away and comes
> back, it has no effect.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] MySQL Cluster: Strange behaviour when forcing movement of resources

2017-02-16 Thread Ken Gaillot
On 02/16/2017 02:26 AM, Félix Díaz de Rada wrote:
> 
> Hi all,
> 
> We are currently setting up a MySQL cluster (Master-Slave) over this
> platform:
> - Two nodes, on RHEL 7.0
> - pacemaker-1.1.10-29.el7.x86_64
> - corosync-2.3.3-2.el7.x86_64
> - pcs-0.9.115-32.el7.x86_64
> There is a IP address resource to be used as a "virtual IP".
> 
> This is configuration of cluster:
> 
> Cluster Name: webmobbdprep
> Corosync Nodes:
>  webmob1bdprep-ges webmob2bdprep-ges
> Pacemaker Nodes:
>  webmob1bdprep-ges webmob2bdprep-ges
> 
> Resources:
>  Group: G_MySQL_M
>   Meta Attrs: priority=100
>   Resource: MySQL_M (class=ocf provider=heartbeat type=mysql_m)
>Attributes:
> binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe
> config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep
> log=/data/webmob_prep/webmob_prep.err
> pid=/data/webmob_prep/webmob_rep.pid
> socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql
> test_table=replica.pacemaker_test test_user=root
>Meta Attrs: resource-stickiness=1000
>Operations: promote interval=0s timeout=120 (MySQL_M-promote-timeout-120)
>demote interval=0s timeout=120 (MySQL_M-demote-timeout-120)
>start interval=0s timeout=120s on-fail=restart
> (MySQL_M-start-timeout-120s-on-fail-restart)
>stop interval=0s timeout=120s (MySQL_M-stop-timeout-120s)
>monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1
> (MySQL_M-monitor-interval-60s-timeout-30s)
>   Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>Attributes: ip=172.18.64.44 nic=ens160:1 cidr_netmask=32
>Meta Attrs: target-role=Started migration-threshold=3
> failure-timeout=60s
>Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s)
>stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s)
>monitor interval=60s (ClusterIP-monitor-interval-60s)
>  Resource: MySQL_S (class=ocf provider=heartbeat type=mysql_s)
>   Attributes:
> binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe
> config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep
> log=/data/webmob_prep/webmob_prep.err
> pid=/data/webmob_prep/webmob_rep.pid
> socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql
> test_table=replica.pacemaker_test test_user=root
>   Meta Attrs: resource-stickiness=0
>   Operations: promote interval=0s timeout=120 (MySQL_S-promote-timeout-120)
>   demote interval=0s timeout=120 (MySQL_S-demote-timeout-120)
>   start interval=0s timeout=120s on-fail=restart
> (MySQL_S-start-timeout-120s-on-fail-restart)
>   stop interval=0s timeout=120s (MySQL_S-stop-timeout-120s)
>   monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1
> (MySQL_S-monitor-interval-60s-timeout-30s)
> 
> Stonith Devices:
> Fencing Levels:
> 
> Location Constraints:
> Ordering Constraints:
>   start MySQL_M then start ClusterIP (Mandatory)
> (id:order-MySQL_M-ClusterIP-mandatory)
>   start G_MySQL_M then start MySQL_S (Mandatory)
> (id:order-G_MySQL_M-MySQL_S-mandatory)
> Colocation Constraints:
>   G_MySQL_M with MySQL_S (-100) (id:colocation-G_MySQL_M-MySQL_S-INFINITY)
> 
> Cluster Properties:
>  cluster-infrastructure: corosync
>  dc-version: 1.1.10-29.el7-368c726
>  last-lrm-refresh: 1487148812
>  no-quorum-policy: ignore
>  stonith-enabled: false
> 
> Pacemaker works as expected under most of situations, but there is one
> scenario that is really not understable to us. I will try to describe it:
> 
> a - Master resource (and Cluster IP address) are active on node 1 and
> Slave resource is active on node 2.
> b - We force movement of Master resource to node 2.
> c - Pacemaker stops all resources: Master, Slave and Cluster IP.
> d - Master resource and Cluster IP are started on node 2 (this is OK),
> but Slave also tries to start (??). It fails (logically, because Master
> resource has been started on the same node), it logs an "unknown error"
> and its state is marked as "failed". This is a capture of 'pcs status'
> at that point:
> 
> OFFLINE: [ webmob1bdprep-ges ]
> Online: [ webmob2bdprep-ges ]
> 
> Full list of resources:
> 
> Resource Group: G_MySQL_M
> MySQL_M (ocf::heartbeat:mysql_m): Started webmob2bdprep-ges
> ClusterIP (ocf::heartbeat:IPaddr2): Started webmob2bdprep-ges
> MySQL_S (ocf::heartbeat:mysql_s): FAILED webmob2bdprep-ges
> 
> Failed actions:
> MySQL_M_monitor_6 on webmob2bdprep-ges 'master' (8): call=62,
> status=complete, last-rc-change='Wed Feb 15 11:54:08 2017', queued=0ms,
> exec=0ms
> MySQL_S_start_0 on webmob2bdprep-ges 'unknown error' (1): call=78,
> status=complete, last-rc-change='Wed Feb 15 11:54:17 2017', queued=40ms,
> exec=0ms
> 
> PCSD Status:
> webmob1bdprep-ges: Offline
> webmob2bdprep-ges: Online
> 
> e - Pacemaker moves Slave resource to node 1 and starts it. Now we have
> both resources started again, Master on node 2 and Slave on node 1.
> f - One 

Re: [ClusterLabs] I question whether STONITH is working.

2017-02-15 Thread Ken Gaillot
On 02/15/2017 12:17 PM, dur...@mgtsciences.com wrote:
> I have 2 Fedora VMs (node1, and node2) running on a Windows 10 machine
> using Virtualbox.
> 
> I began with this.
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/
> 
> 
> When it came to fencing, I refered to this.
> http://www.linux-ha.org/wiki/SBD_Fencing
> 
> To the file /etc/sysconfig/sbd I added these lines.
> SBD_OPTS="-W"
> SBD_DEVICE="/dev/sdb1"
> I added 'modprobe softdog' to rc.local
> 
> After getting sbd working, I resumed with Clusters from Scratch, chapter
> 8.3.
> I executed these commands *only* one node1.  Am I suppose to run any of
> these commands on other nodes? 'Clusters from Scratch' does not specify.

Configuration commands only need to be run once. The cluster
synchronizes all changes across the cluster.

> pcs cluster cib stonith_cfg
> pcs -f stonith_cfg stonith create sbd-fence fence_sbd
> devices="/dev/sdb1" port="node2"

The above command creates a fence device configured to kill node2 -- but
it doesn't tell the cluster which nodes the device can be used to kill.
Thus, even if you try to fence node1, it will use this device, and node2
will be shot.

The pcmk_host_list parameter specifies which nodes the device can kill.
If not specified, the device will be used to kill any node. So, just add
pcmk_host_list=node2 here.

You'll need to configure a separate device to fence node1.

I haven't used fence_sbd, so I don't know if there's a way to configure
it as one device that can kill both nodes.

> pcs -f stonith_cfg property set stonith-enabled=true
> pcs cluster cib-push stonith_cfg
> 
> I then tried this command from node1.
> stonith_admin --reboot node2
> 
> Node2 did not reboot or even shutdown. the command 'sbd -d /dev/sdb1
> list' showed node2 as off, but I was still logged into it (cluster
> status on node2 showed not running).
> 
> I rebooted and ran this command on node 2 and started cluster.
> sbd -d /dev/sdb1 message node2 clear
> 
> If I ran this command on node2, node2 rebooted.
> stonith_admin --reboot node1
> 
> What have I missed or done wrong?
> 
> 
> Thank you,
> 
> Durwin F. De La Rue
> Management Sciences, Inc.
> 6022 Constitution Ave. NE
> Albuquerque, NM  87110
> Phone (505) 255-8611


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Live Guest Migration timeouts for VirtualDomain resources

2017-01-19 Thread Ken Gaillot
On 01/19/2017 01:36 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 18.01.2017 um 16:32 in 
>>>> Nachricht
> <4b02d3fa-4693-473b-8bed-dc98f9e3f...@redhat.com>:
>> On 01/17/2017 04:45 PM, Scott Greenlese wrote:
>>> Ken and Co,
>>>
>>> Thanks for the useful information.
>>>
> 
> [...]
>>>
>>> Is this internally coded within the class=ocf provider=heartbeat
>>> type=VirtualDomain resource agent?
>>
>> Aha, I just realized what the issue is: the operation name is
>> migrate_to, not migrate-to.
>>
>> For technical reasons, pacemaker can't validate operation names (at the
>> time that the configuration is edited, it does not necessarily have
>> access to the agent metadata).
> 
> BUT the set of operations is finite, right? So if those were in some XML 
> schema, the names could be verified at least (not meaning that the operation 
> is actually supported).
> BTW: Would a "crm configure verify" detect this kijnd of problem?
> 
> [...]
> 
> Ulrich

Yes, it's in the resource agent meta-data. While pacemaker itself uses a
small set of well-defined actions, the agent may define any arbitrarily
named actions it desires, and the user could configure one of these as a
recurring action in pacemaker.

Pacemaker itself has to be liberal about where its configuration comes
from -- the configuration can be edited on a separate machine, which
doesn't have resource agents, and then uploaded to the cluster. So
Pacemaker can't do that validation at configuration time. (It could
theoretically do some checking after the fact when the configuration is
loaded, but this could be a lot of overhead, and there are
implementation issues at the moment.)

Higher-level tools like crmsh and pcs, on the other hand, can make
simplifying assumptions. They can require access to the resource agents
so that they can do extra validation.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] how do I disable/negate resource option?

2017-01-19 Thread Ken Gaillot
On 01/19/2017 06:30 AM, lejeczek wrote:
> hi all
> 
> how can it be done? Is it possible?
> many thanks,
> L.

Check the man page / documentation for whatever tool you're using (crm,
pcs, etc.). Each one has its own syntax.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-17 Thread Ken Gaillot
On 01/17/2017 08:52 AM, Ulrich Windl wrote:
 Oscar Segarra  schrieb am 17.01.2017 um 10:15 in
> Nachricht
> :
>> Hi,
>>
>> Yes, I will try to explain myself better.
>>
>> *Initially*
>> On node1 (vdicnode01-priv)
>>> virsh list
>> ==
>> vdicdb01 started
>>
>> On node2 (vdicnode02-priv)
>>> virsh list
>> ==
>> vdicdb02 started
>>
>> --> Now, I execute the migrate command (outside the cluster <-- not using
>> pcs resource move)
>> virsh migrate --live vdicdb01 qemu:/// qemu+ssh://vdicnode02-priv
>> tcp://vdicnode02-priv
> 
> One of the rules of successful clustering is: If resurces are managed by the 
> cluster, they are managed by the cluster only! ;-)
> 
> I guess one node is trying to restart the VM once it vanished, and the other 
> node might try to shut down the VM while it's being migrated.
> Or any other undesired combination...


As Ulrich says here, you can't use virsh to manage VMs once they are
managed by the cluster. Instead, configure your cluster to support live
migration:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-migrating-resources

and then use pcs resource move (which is just location constraints under
the hood) to move VMs.

What's happening in your example is:

* Your VM cluster resource has a monitor operation ensuring that is
running properly on the desired node.

* It is also possible to configure a monitor to ensure that the resource
is not running on nodes where it's not supposed to be (a monitor with
role="Stopped"). You don't have one of these (which is fine, and common).

* When you move the VM, the cluster detects that it is not running on
the node you told it to keep it running on. Because there is no
"Stopped" monitor, the cluster doesn't immediately realize that a new
rogue instance is running on another node. So, the cluster thinks the VM
crashed on the original node, and recovers it by starting it again.

If your goal is to take a VM out of cluster management without stopping
it, you can "unmanage" the resource.


>> *Finally*
>> On node1 (vdicnode01-priv)
>>> virsh list
>> ==
>> *vdicdb01 started*
>>
>> On node2 (vdicnode02-priv)
>>> virsh list
>> ==
>> vdicdb02 started
>> vdicdb01 started
>>
>> If I query cluster pcs status, cluster thinks resource vm-vdicdb01 is only
>> started on node vdicnode01-priv.
>>
>> Thanks a lot.
>>
>>
>>
>> 2017-01-17 10:03 GMT+01:00 emmanuel segura :
>>
>>> sorry,
>>>
>>> But do you mean, when you say, you migrated the vm outside of the
>>> cluster? one server out side of you cluster?
>>>
>>> 2017-01-17 9:27 GMT+01:00 Oscar Segarra :
 Hi,

 I have configured a two node cluster whewe run 4 kvm guests on.

 The hosts are:
 vdicnode01
 vdicnode02

 And I have created a dedicated network card for cluster management. I
>>> have
 created required entries in /etc/hosts:
 vdicnode01-priv
 vdicnode02-priv

 The four guests have collocation rules in order to make them distribute
 proportionally between my two nodes.

 The problem I have is that if I migrate a guest outside the cluster, I
>>> mean
 using the virsh migrate - - live...  Cluster,  instead of moving back the
 guest to its original node (following collocation sets),  Cluster starts
 again the guest and suddenly I have the same guest running on both nodes
 causing xfs corruption in guest.

 Is there any configuration applicable to avoid this unwanted behavior?

 Thanks a lot

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-17 Thread Ken Gaillot
On 01/17/2017 10:05 AM, Oscar Segarra wrote:
> Hi, 
> 
> * It is also possible to configure a monitor to ensure that the resource
> is not running on nodes where it's not supposed to be (a monitor with
> role="Stopped"). You don't have one of these (which is fine, and common).
> 
> Can you provide more information/documentation about role="Stopped"

Since you're using pcs, you can either configure monitors when you
create the resource with pcs resource create, or you can add/remove
monitors later with pcs resource op add/remove.

For example:

pcs resource op add my-resource-name op monitor interval=10s role="Stopped"

With a normal monitor op (role="Started" or omitted), the cluster will
run the resource agent's monitor command on any node that's supposed to
be running the resource. With the above example, it will additionally
run a monitor on all other nodes, so that if it finds the resource
running somewhere it's not supposed to be, it can stop it.

Note that each monitor op must have a unique timeout. So if your
existing monitor runs every 10s, you need to pick a different value for
the new monitor.

> And, please, can you explain how VirtualDomain resource agents manages
> the scenario I've presented?
> 
> /What happens If I stop pacemaker and corosync services in all nodes and
> I start them again... ¿will I have all guests running twice?/
> 
> Thanks a lot

If you stop cluster services, by default the cluster will first stop all
resources. You can set maintenance mode, or unmanage one or more
resources, to prevent the stops.

When cluster services first start on a node, the cluster "probes" the
status of all resources on that node, by running a one-time monitor. So
it will detect anything running at that time, and start or stop services
as needed to meet the configured requirements.

> 2017-01-17 16:38 GMT+01:00 Ken Gaillot <kgail...@redhat.com
> <mailto:kgail...@redhat.com>>:
> 
> On 01/17/2017 08:52 AM, Ulrich Windl wrote:
> >>>> Oscar Segarra <oscar.sega...@gmail.com 
> <mailto:oscar.sega...@gmail.com>> schrieb am
> 17.01.2017 um 10:15 in
> > Nachricht
> > <cajq8tag8vhx5j1xqpqmrq-9omfnxkhqs54mbzz491_6df9a...@mail.gmail.com
> 
> <mailto:cajq8tag8vhx5j1xqpqmrq-9omfnxkhqs54mbzz491_6df9a...@mail.gmail.com>>:
> >> Hi,
> >>
> >> Yes, I will try to explain myself better.
> >>
> >> *Initially*
> >> On node1 (vdicnode01-priv)
> >>> virsh list
> >> ==
> >> vdicdb01 started
> >>
> >> On node2 (vdicnode02-priv)
> >>> virsh list
> >> ==
> >> vdicdb02 started
> >>
> >> --> Now, I execute the migrate command (outside the cluster <-- not 
> using
> >> pcs resource move)
> >> virsh migrate --live vdicdb01 qemu:/// qemu+ssh://vdicnode02-priv
> >> tcp://vdicnode02-priv
> >
> > One of the rules of successful clustering is: If resurces are managed 
> by the cluster, they are managed by the cluster only! ;-)
> >
> > I guess one node is trying to restart the VM once it vanished, and the 
> other node might try to shut down the VM while it's being migrated.
> > Or any other undesired combination...
> 
> 
> As Ulrich says here, you can't use virsh to manage VMs once they are
> managed by the cluster. Instead, configure your cluster to support live
> migration:
> 
> 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-migrating-resources
> 
> <http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-migrating-resources>
> 
> and then use pcs resource move (which is just location constraints under
> the hood) to move VMs.
> 
> What's happening in your example is:
> 
> * Your VM cluster resource has a monitor operation ensuring that is
> running properly on the desired node.
> 
> * It is also possible to configure a monitor to ensure that the resource
> is not running on nodes where it's not supposed to be (a monitor with
> role="Stopped"). You don't have one of these (which is fine, and
> common).
> 
> * When you move the VM, the cluster detects that it is not running on
> the node you told it to keep it running on. Because there is no
> "Stopped" monitor, the cluster doesn't immediately realize that a new
> rogue instance is running on another node. So, the cluster thinks the VM
> crashed on the original node,

Re: [ClusterLabs] Pacemaker cluster not working after switching from 1.0 to 1.1 (resend as plain text)

2017-01-16 Thread Ken Gaillot
A preliminary question -- what cluster layer are you running?

Pacemaker 1.0 worked with heartbeat or corosync 1, while Ubuntu 14.04
ships with corosync 2 by default, IIRC. There were major incompatible
changes between corosync 1 and 2, so it's important to get that right
before looking at pacemaker.

A general note, when making such a big jump in the pacemaker version,
I'd recommend running "cibadmin --upgrade" both before exporting the
configuration from 1.0, and again after deploying it on 1.1. This will
apply any transformations needed in the CIB syntax. Pacemaker will do
this on the fly, but doing it manually lets you see any issues early, as
well as being more efficient.

On 01/16/2017 12:24 AM, Rick Kint wrote:
> Sorry about the garbled email. Trying again with plain text.
> 
> 
> 
> A working cluster running on Pacemaker 1.0.12 on RHEL5 has been copied with 
> minimal modifications to Pacemaker 1.1.10 on Ubuntu 14.04. The version string 
> is "1.1.10+git20130802-1ubuntu2.3".
> 
> We run simple active/standby two-node clusters. 
> 
> There are four resources on each node:
> - a stateful resource (Encryptor) representing a process in either active or 
> standby mode.
> -- this process does not maintain persistent data.
> - a clone resource (CredProxy) representing a helper process.
> - two clone resources (Ingress, Egress) representing network interfaces.
> 
> Colocation constraints require that all three clone resources must be in 
> Started role in order for the stateful Encryptor resource to be in Master 
> role.
> 
> The full configuration is at the end of this message.
> 
> The Encryptor resource should fail over on these events:
> - active node (i.e. node containing active Encryptor process) goes down
> - active Encryptor process goes down and cannot be restarted
> - auxiliary CredProxy process on active node goes down and cannot be restarted
> - either interface on active node goes down
> 
> All of these events trigger failover on the old platform (Pacemaker 1.0 on 
> RHEL5).
> 
> However, on the new platform (Pacemaker 1.1 on Ubuntu) neither interface 
> failure nor auxiliary process failure trigger failover. Pacemaker goes into a 
> loop where it starts and stops the active Encryptor resource and never 
> promotes the standby Encryptor resource. Cleaning up the failed resource 
> manually and issuing "crm_resource --cleanup" clears the jam and the standby 
> Encryptor resource is promoted. So does taking the former active node offline 
> completely.
> 
> The pe-input-X.bz2 files show this sequence:
> 
> (EncryptBase:1 is active, EncryptBase:0 is standby)
> 
> T: pacemaker recognizes that Ingress has failed
> transition: recover Ingress on active node
> 
> T+1: transition: recover Ingress on active node
> 
> T+2: transition: recover Ingress on active node
> 
> T+3: transitions: promote EncryptBase:0, demote EncryptBase:1, stop Ingress 
> on active node (no-op)
> 
> T+4: EncryptBase:1 demoted (both clones are now in slave mode), Ingress 
> stopped
> transitions: promote  EncryptBase:0, stop EncryptBase:1
> 
> T+5: EncryptBase:1 stopped, EncryptBase:0 still in slave role
> transitions: promote EncryptBase:0, start EncryptBase:1
> 
> T+6: EncryptBase:1 started (slave role)
> transitions: promote EncryptBase:0, stop EncryptBase:1
> 
> The last two steps repeat. Although pengine has decided that EncryptBase:0 
> should be promoted, Pacemaker keeps stopping and starting EncryptBase:1 (the 
> one on the node with the failed interface) without ever promoting 
> EncryptBase:0.
> 
> More precisely, crmd never issues the command that would cause promotion. For 
> a normal promotion, I see a sequence like this:
> 
> 2017-01-12T20:04:39.887154+00:00 encryptor4 pengine[2201]:   notice: 
> LogActions: Promote EncryptBase:0  (Slave -> Master encryptor4)
> 2017-01-12T20:04:39.888018+00:00 encryptor4 pengine[2201]:   notice: 
> process_pe_message: Calculated Transition 3: 
> /var/lib/pacemaker/pengine/pe-input-3.bz2
> 2017-01-12T20:04:39.888428+00:00 encryptor4 crmd[2202]:   notice: 
> te_rsc_command: Initiating action 9: promote EncryptBase_promote_0 on 
> encryptor4 (local)
> 2017-01-12T20:04:39.903827+00:00 encryptor4 Encryptor_ResourceAgent: INFO: 
> Promoting Encryptor.
> 2017-01-12T20:04:44.959804+00:00 encryptor4 crmd[2202]:   notice: 
> process_lrm_event: LRM operation EncryptBase_promote_0 (call=42, rc=0, 
> cib-update=43, confirmed=true) ok
> 
> in which crmd initiates an action for promotion and the RA logs a message 
> indicating that it was called with the arg "promote".
> 
> In contrast, the looping sections look like this:
> 
> (EncryptBase:1 on encryptor5 is the active/Master instance, EncryptBase:0 on 
> encryptor4 is the standby/Slave instance)
> 
> 2017-01-12T20:12:36.548980+00:00 encryptor4 pengine[2201]:   notice: 
> LogActions: Promote EncryptBase:0(Slave -> Master encryptor4)
> 2017-01-12T20:12:36.549005+00:00 encryptor4 pengine[2201]:   notice: 
> LogActions: Stop

Re: [ClusterLabs] Problems with corosync and pacemaker with error scenarios

2017-01-16 Thread Ken Gaillot
On 01/16/2017 08:56 AM, Gerhard Wiesinger wrote:
> Hello,
> 
> I'm new to corosync and pacemaker and I want to setup a nginx cluster
> with quorum.
> 
> Requirements:
> - 3 Linux maschines
> - On 2 maschines floating IP should be handled and nginx as a load
> balancing proxy
> - 3rd maschine is for quorum only, no services must run there
> 
> Installed on all 3 nodes corosync/pacemaker, firewall ports openend are:
> 5404, 5405, 5406 for udp in both directions

If you're using firewalld, the easiest configuration is:

  firewall-cmd --permanent --add-service=high-availability

If not, depending on what you're running, you may also want to open  TCP
ports 2224 (pcsd), 3121 (Pacemaker Remote), and 21064 (DLM).

> OS: Fedora 25
> 
> Configuration of corosync (only the bindnetaddr is different on every
> maschine) and pacemaker below.

FYI you don't need a different bindnetaddr. You can (and generally
should) use the *network* address, which is the same on all hosts.

> Configuration works so far but error test scenarios don't work like
> expected:
> 1.) I had cases in testing without qourum and quorum again where the
> cluster kept in Stopped state
>   I had to restart the whole stack to get it online again (killall -9
> corosync;systemctl restart corosync;systemctl restart pacemaker)
>   Any ideas?

It will be next to impossible to say without logs. It's definitely not
expected behavior. Stopping is the correct response to losing quorum;
perhaps quorum is not being properly restored for some reason. What is
your test methodology?

> 2.) Restarting pacemaker on inactive node also restarts resources on the
> other active node:
> a.) Everything up & ok
> b.) lb01 handles all resources
> c.) on lb02 which handles no resrouces: systemctl restart pacemaker:
>   All resources will also be restart with a short outage on lb01 (state
> is Stopped, Started[ lb01 lb02 ] and then Started lb02)
>   How can this be avoided?

This is not expected behavior, except with clones, which I don't see you
using.

> 3.) Stopping and starting corosync doesn't awake the node up again:
>   systemctl stop corosync;sleep 10;systemctl restart corosync
>   Online: [ kvm01 lb01 ]
>   OFFLINE: [ lb02 ]
>   Stays in that state until pacemaker is restarted: systemctl restart
> pacemaker
>   Bug?

No, pacemaker should always restart if corosync restarts. That is
specified in the systemd units, so I'm not sure why pacemaker didn't
automatically restart in your case.

> 4.) "systemctl restart corosync" hangs sometimes (waiting 2 min)
>   needs a
>   killall -9 corosync;systemctl restart corosync;systemctl restart
> pacemaker
>   sequence to get it up gain
> 
> 5.) Simulation of split brain: Disabling/reenabling local firewall
> (ports 5404, 5405, 5406) on node lb01 and lb02 for the following ports

FYI for an accurate simulation, be sure to block both incoming and
outgoing traffic on the corosync ports.

> doesn't bring corosync up again after reenabling lb02 firewall
> partition WITHOUT quorum
> Online: [ kvm01 ]
> OFFLINE: [ lb01 lb02 ]
>   NOK: restart on lb02: systemctl restart corosync;systemctl restart
> pacemaker
>   OK:  restart on lb02 and kvm01 (quorum host): systemctl restart
> corosync;systemctl restart pacemaker
>   I also see that non enabled hosts (quorum hosts) are also tried to be
> started on kvm01
>   Started[ kvm01 lb02 ]
>   Started lb02
>   Any ideas?
> 
> I've also written a new ocf:heartbeat:Iprule to modify "ip rule"
> accordingly.
> 
> Versions are:
> corosync: 2.4.2
> pacemaker: 1.1.16
> Kernel: 4.9.3-200.fc25.x86_64
> 
> Thnx.
> 
> Ciao,
> Gerhard
> 
> Corosync config:
> 
> 
> totem {
> version: 2
> cluster_name: lbcluster
> crypto_cipher: aes256
> crypto_hash: sha512
> interface {
> ringnumber: 0
> bindnetaddr: 1.2.3.35
> mcastport: 5405
> }
> transport: udpu
> }
> logging {
> fileline: off
> to_logfile: yes
> to_syslog: yes
> logfile: /var/log/cluster/corosync.log
> debug: off
> timestamp: on
> logger_subsys {
> subsys: QUORUM
> debug: off
> }
> }
> nodelist {
> node {
> ring0_addr: lb01
> nodeid: 1
> }
> node {
> ring0_addr: lb02
> nodeid: 2
> }
> node {
> ring0_addr: kvm01
> nodeid: 3
> }
> }
> quorum {
> # Enable and configure quorum subsystem (default: off)
> # see also corosync.conf.5 and votequorum.5
> #provider: corosync_votequorum
> provider: corosync_votequorum
> # Only for 2 node setup!
> #  two_node: 1
> }
> 

Re: [ClusterLabs] Live Guest Migration timeouts for VirtualDomain resources

2017-01-18 Thread Ken Gaillot
log_finished:
> finished - rsc:zs95kjg110061_res action:migrate_to call_id:941
> pid:135045 exit-code:1 exec-time:20003ms queue-time:0ms
> Jan 17 13:55:14 [27552] zs95kj lrmd: ( lrmd.c:1292 ) trace:
> lrmd_rsc_execute: Nothing further to do for zs95kjg110061_res
> Jan 17 13:55:14 [27555] zs95kj crmd: ( utils.c:1942 ) debug:
> create_operation_update: do_update_resource: Updating resource
> zs95kjg110061_res after migrate_to op Timed Out (interval=0)
> Jan 17 13:55:14 [27555] zs95kj crmd: ( lrm.c:2397 ) error:
> process_lrm_event: Operation zs95kjg110061_res_migrate_to_0: Timed Out
> (node=zs95kjpcs1, call=941, timeout=2ms)
> Jan 17 13:55:14 [27555] zs95kj crmd: ( lrm.c:196 ) debug:
> update_history_cache: Updating history for 'zs95kjg110061_res' with
> migrate_to op
> 
> 
> Any ideas?
> 
> 
> 
> Scott Greenlese ... IBM KVM on System z - Solution Test, Poughkeepsie, N.Y.
> INTERNET: swgre...@us.ibm.com
> 
> 
> Inactive hide details for Ken Gaillot ---01/17/2017 11:41:53 AM---On
> 01/17/2017 10:19 AM, Scott Greenlese wrote: > Hi..Ken Gaillot
> ---01/17/2017 11:41:53 AM---On 01/17/2017 10:19 AM, Scott Greenlese
> wrote: > Hi..
> 
> From: Ken Gaillot <kgail...@redhat.com>
> To: users@clusterlabs.org
> Date: 01/17/2017 11:41 AM
> Subject: Re: [ClusterLabs] Live Guest Migration timeouts for
> VirtualDomain resources
> 
> 
> 
> 
> 
> On 01/17/2017 10:19 AM, Scott Greenlese wrote:
>> Hi..
>>
>> I've been testing live guest migration (LGM) with VirtualDomain
>> resources, which are guests running on Linux KVM / System Z
>> managed by pacemaker.
>>
>> I'm looking for documentation that explains how to configure my
>> VirtualDomain resources such that they will not timeout
>> prematurely when there is a heavy I/O workload running on the guest.
>>
>> If I perform the LGM with an unmanaged guest (resource disabled), it
>> takes anywhere from 2 - 5 minutes to complete the LGM.
>> Example:
>>
>> # Migrate guest, specify a timeout value of 600s
>>
>> [root@zs95kj VD]# date;virsh --keepalive-interval 10 migrate --live
>> --persistent --undefinesource*--timeout 600* --verbose zs95kjg110061
>> qemu+ssh://zs90kppcs1/system
>> Mon Jan 16 16:35:32 EST 2017
>>
>> Migration: [100 %]
>>
>> [root@zs95kj VD]# date
>> Mon Jan 16 16:40:01 EST 2017
>> [root@zs95kj VD]#
>>
>> Start: 16:35:32
>> End: 16:40:01
>> Total: *4 min 29 sec*
>>
>>
>> In comparison, when the guest is managed by pacemaker, and enabled for
>> LGM ... I get this:
>>
>> [root@zs95kj VD]# date;pcs resource show zs95kjg110061_res
>> Mon Jan 16 15:13:33 EST 2017
>> Resource: zs95kjg110061_res (class=ocf provider=heartbeat
>> type=VirtualDomain)
>> Attributes: config=/guestxml/nfs1/zs95kjg110061.xml
>> hypervisor=qemu:///system migration_transport=ssh
>> Meta Attrs: allow-migrate=true remote-node=zs95kjg110061
>> remote-addr=10.20.110.61
>> Operations: start interval=0s timeout=480
>> (zs95kjg110061_res-start-interval-0s)
>> stop interval=0s timeout=120 (zs95kjg110061_res-stop-interval-0s)
>> monitor interval=30s (zs95kjg110061_res-monitor-interval-30s)
>> migrate-from interval=0s timeout=1200
>> (zs95kjg110061_res-migrate-from-interval-0s)
>> *migrate-to* interval=0s *timeout=1200*
>> (zs95kjg110061_res-migrate-to-interval-0s)
>>
>> NOTE: I didn't specify any migrate-to value for timeout, so it defaulted
>> to 1200. Is this seconds? If so, that's 20 minutes,
>> ample time to complete a 5 minute migration.
> 
> Not sure where the default of 1200 comes from, but I believe the default
> is milliseconds if no unit is specified. Normally you'd specify
> something like "timeout=1200s".
> 
>> [root@zs95kj VD]# date;pcs resource show |grep zs95kjg110061_res
>> Mon Jan 16 14:27:01 EST 2017
>> zs95kjg110061_res (ocf::heartbeat:VirtualDomain): Started zs90kppcs1
>> [root@zs95kj VD]#
>>
>>
>> [root@zs95kj VD]# date;*pcs resource move zs95kjg110061_res zs95kjpcs1*
>> Mon Jan 16 14:45:39 EST 2017
>> You have new mail in /var/spool/mail/root
>>
>>
>> Jan 16 14:45:37 zs90kp VirtualDomain(zs95kjg110061_res)[21050]: INFO:
>> zs95kjg110061: *Starting live migration to zs95kjpcs1 (using: virsh
>> --connect=qemu:///system --quiet migrate --live zs95kjg110061
>> qemu+ssh://zs95kjpcs1/system ).*
>> Jan 16 14:45:57 zs90kp lrmd[12798]: warning:
>> zs95kjg110061_res_migrate_to_0 process (PID 21050) timed out
>> Jan 16 14:45:57 zs90kp lrmd[12798

Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-18 Thread Ken Gaillot
On 01/18/2017 03:49 AM, Ferenc Wágner wrote:
> Ken Gaillot <kgail...@redhat.com> writes:
> 
>> * When you move the VM, the cluster detects that it is not running on
>> the node you told it to keep it running on. Because there is no
>> "Stopped" monitor, the cluster doesn't immediately realize that a new
>> rogue instance is running on another node. So, the cluster thinks the VM
>> crashed on the original node, and recovers it by starting it again.
> 
> Ken, do you mean that if a periodic "stopped" monitor is configured, it
> is forced to run immediately (out of schedule) when the regular periodic
> monitor unexpectedly returns with stopped status?  That is, before the
> cluster takes the recovery action?  Conceptually, that would be similar
> to the probe run on node startup.  If not, then maybe it would be a
> useful resource option to have (I mean running cluster-wide probes on an
> unexpected monitor failure, before recovery).  An optional safety check.

No, there is nothing like that currently. The regular and "Stopped"
monitors run independently. Because they must have different intervals,
that does mean that the two sides of the issue may be detected at
different times.

It is an interesting idea to have an option to reprobe on operation
failure. I think it may be overkill; the only failure situation it would
be good for is one like this, where a resource was moved out of cluster
control. The vast majority of failure scenarios wouldn't be helped. If
that sort of thing happens a lot in your cluster, you really need to
figure out how to stop doing that. :)

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] how do I disable/negate resource option?

2017-01-20 Thread Ken Gaillot
On 01/19/2017 09:52 AM, lejeczek wrote:
> 
> 
> On 19/01/17 15:30, Ken Gaillot wrote:
>> On 01/19/2017 06:30 AM, lejeczek wrote:
>>> hi all
>>>
>>> how can it be done? Is it possible?
>>> many thanks,
>>> L.
>> Check the man page / documentation for whatever tool you're using (crm,
>> pcs, etc.). Each one has its own syntax.
> 
> I'd think this would be some built-in logic for a given tool, eg pcs
> which I'm using (as am on 7.x) but no. I don't think an RA option(s) can
> be ignored, or even better disabled.
> I'm looking at specific RA(s) descriptions and nothing like "disable"
> can I find there. Which is a shame, I really thought by design it'd be
> possible. Take CTDB, it's a mess, probably because of the version, in
> rhel.7.x must be newer, with which the CTDB has not kept up, it fails.
> I only started checking HA two days ago, and in my case it's a bit
> discouraging experience with that CTDB. Seems like the only road is to
> inside RA definition files. Is it really or I'm missing something
> fundamental, trivial?

Maybe I'm misunderstanding what you mean by "disable".

If you just want to change the value used for the option, you can do:

  pcs resource update RESOURCE_NAME OPTION_NAME=NEW_VALUE

If you mean that the resource agent is buggy, and you want it not to use
a particular option at all, then that can only be addressed within the
agent. If you're comfortable with shell scripting, you can copy it and
make any necessary changes yourself. Also feel free to open a bug at:

  https://github.com/ClusterLabs/resource-agents/issues

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Mysql slave did not start replication after failure, and read-only IP also remained active on the much outdated slave

2016-08-22 Thread Ken Gaillot
On 08/22/2016 07:24 AM, Attila Megyeri wrote:
> Hi Andrei,
> 
> I waited several hours, and nothing happened. 

And actually, we can see from the configuration you provided that
cluster-recheck-interval is 2 minutes.

I don't see anything about stonith; is it enabled and tested? This looks
like a situation where stonith would come into play. I know that power
fencing can be rough on a MySQL database, but perhaps intelligent
switches with network fencing would be appropriate.

The "Corosync main process was not scheduled" message is the start of
the trouble. It means the system was overloaded and corosync didn't get
any CPU time, so it couldn't maintain cluster communication.

Probably the most useful thing would be to upgrade to a recent version
of corosync+pacemaker+resource-agents. Recent corosync versions run with
realtime priority, which makes this much less likely.

Other than that, figure out what the load issue was, and try to prevent
it from recurring.

I'm not familiar enough with the RA to comment on its behavior. If you
think it's suspect, check the logs during the incident for messages from
the RA.

> I assume that the RA does not treat this case properly. Mysql was running, 
> but the "show slave status" command returned something that the RA was not 
> prepared to parse, and instead of reporting a non-readable attribute, it 
> returned some generic error, that did not stop the server. 
> 
> Rgds,
> Attila
> 
> 
> -Original Message-
> From: Andrei Borzenkov [mailto:arvidj...@gmail.com] 
> Sent: Monday, August 22, 2016 11:42 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed 
> 
> Subject: Re: [ClusterLabs] Mysql slave did not start replication after 
> failure, and read-only IP also remained active on the much outdated slave
> 
> On Mon, Aug 22, 2016 at 12:18 PM, Attila Megyeri
>  wrote:
>> Dear community,
>>
>>
>>
>> A few days ago we had an issue in our Mysql M/S replication cluster.
>>
>> We have a one R/W Master, and a one RO Slave setup. RO VIP is supposed to be
>> running on the slave if it is not too much behind the master, and if any
>> error occurs, RO VIP is moved to the master.
>>
>>
>>
>> Something happened with the slave Mysql (some disk issue, still
>> investigating), but the problem is, that the slave VIP remained on the slave
>> device, even though the slave process was not running, and the server was
>> much outdated.
>>
>>
>>
>> During the issue the following log entries appeared (just an extract as it
>> would be too long):
>>
>>
>>
>>
>>
>> Aug 20 02:04:07 ctdb1 corosync[1056]:   [MAIN  ] Corosync main process was
>> not scheduled for 14088.5488 ms (threshold is 4000. ms). Consider token
>> timeout increase.
>>
>> Aug 20 02:04:07 ctdb1 corosync[1056]:   [TOTEM ] A processor failed, forming
>> new configuration.
>>
>> Aug 20 02:04:34 ctdb1 corosync[1056]:   [MAIN  ] Corosync main process was
>> not scheduled for 27065.2559 ms (threshold is 4000. ms). Consider token
>> timeout increase.
>>
>> Aug 20 02:04:34 ctdb1 corosync[1056]:   [TOTEM ] A new membership (xxx:6720)
>> was formed. Members left: 168362243 168362281 168362282 168362301 168362302
>> 168362311 168362312 1
>>
>> Aug 20 02:04:34 ctdb1 corosync[1056]:   [TOTEM ] A new membership (xxx:6724)
>> was formed. Members
>>
>> ..
>>
>> Aug 20 02:13:28 ctdb1 corosync[1056]:   [MAIN  ] Completed service
>> synchronization, ready to provide service.
>>
>> ..
>>
>> Aug 20 02:13:29 ctdb1 attrd[1584]:   notice: attrd_trigger_update: Sending
>> flush op to all hosts for: readable (1)
>>
>> …
>>
>> Aug 20 02:13:32 ctdb1 mysql(db-mysql)[10492]: INFO: post-demote notification
>> for ctdb1
>>
>> Aug 20 02:13:32 ctdb1 IPaddr2(db-ip-master)[10490]: INFO: IP status = ok,
>> IP_CIP=
>>
>> Aug 20 02:13:32 ctdb1 crmd[1586]:   notice: process_lrm_event: LRM operation
>> db-ip-master_stop_0 (call=371, rc=0, cib-update=179, confirmed=true) ok
>>
>> Aug 20 02:13:32 ctdb1 IPaddr2(db-ip-slave)[10620]: INFO: Adding inet address
>> xxx/24 with broadcast address  to device eth0
>>
>> Aug 20 02:13:32 ctdb1 IPaddr2(db-ip-slave)[10620]: INFO: Bringing device
>> eth0 up
>>
>> Aug 20 02:13:32 ctdb1 IPaddr2(db-ip-slave)[10620]: INFO:
>> /usr/lib/heartbeat/send_arp -i 200 -r 5 -p
>> /usr/var/run/resource-agents/send_arp-xxx eth0 xxx auto not_used not_used
>>
>> Aug 20 02:13:32 ctdb1 crmd[1586]:   notice: process_lrm_event: LRM operation
>> db-ip-slave_start_0 (call=377, rc=0, cib-update=180, confirmed=true) ok
>>
>> Aug 20 02:13:32 ctdb1 crmd[1586]:   notice: process_lrm_event: LRM operation
>> db-ip-slave_monitor_2 (call=380, rc=0, cib-update=181, confirmed=false)
>> ok
>>
>> Aug 20 02:13:32 ctdb1 crmd[1586]:   notice: process_lrm_event: LRM operation
>> db-mysql_notify_0 (call=374, rc=0, cib-update=0, confirmed=true) ok
>>
>> Aug 20 02:13:32 ctdb1 attrd[1584]:   notice: attrd_trigger_update: Sending
>> flush op to all hosts for: master-db-mysql (1)
>>

Re: [ClusterLabs] R: Re: Antw: Re: Ordering Sets of Resources

2017-03-01 Thread Ken Gaillot
On 03/01/2017 03:22 PM, iva...@libero.it wrote:
> You are right, but i had to use option symmetrical=false because i need to 
> stop, when all resources are running, even the single primitive with no 
> impact 
> to others resources.
> 
> I have also used symmetrical=false with kind=Optional.
> The stop of the individual resource does not stop the others resources, but 
> if 
> during the startup or shutdown of the resources is used a list of primitives 
> without any order, the resources will start or stop without respecting the 
> constraint strictly.
> 
> Regards
> Ivan

If I understand, you want to be able to specify resources A B C such
that they always start in that order, but stopping can be in any
combination:
* just A
* just B
* just C
* just A and B (in which case B stops then A)
* just A and C (in which case C stops then A)
* just B and C (in which case C stops then B)
* or all (in which case C stops, then B, then A)

There may be a fancy way to do it with sets, but my first thought is:

* Keep the start constraint you have

* Use individual ordering constraints between each resource pair with
kind=Optional and action=stop

>> Messaggio originale
>> Da: "Ken Gaillot" <kgail...@redhat.com>
>> Data: 01/03/2017 15.57
>> A: "Ulrich Windl"<ulrich.wi...@rz.uni-regensburg.de>, <users@clusterlabs.org>
>> Ogg: Re: [ClusterLabs] Antw: Re:  Ordering Sets of Resources
>>
>> On 03/01/2017 01:36 AM, Ulrich Windl wrote:
>>>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 26.02.2017 um 20:04 in 
> Nachricht
>>> <dbf562ff-a830-fc3c-84dc-487b892fc...@redhat.com>:
>>>> On 02/25/2017 03:35 PM, iva...@libero.it wrote:
>>>>> Hi all,
>>>>> i have configured a two node cluster on redhat 7.
>>>>>
>>>>> Because I need to manage resources stopping and starting singularly when
>>>>> they are running I have configured cluster using order set constraints.
>>>>>
>>>>> Here the example
>>>>>
>>>>> Ordering Constraints:
>>>>>   Resource Sets:
>>>>> set MYIP_1 MYIP_2 MYFTP MYIP_5 action=start sequential=false
>>>>> require-all=true set MYIP_3 MYIP_4 MYSMTP action=start sequential=true
>>>>> require-all=true setoptions symmetrical=false
>>>>> set MYSMTP MYIP_4 MYIP_3 action=stop sequential=true
>>>>> require-all=true set MYIP_5 MYFTP MYIP_2 MYIP_1 action=stop
>>>>> sequential=true require-all=true setoptions symmetrical=false 
> kind=Mandatory
>>>>>
>>>>> The constrait work as expected on start but when stopping the resource
>>>>> don't respect the order.
>>>>> Any help is appreciated
>>>>>
>>>>> Thank and regards
>>>>> Ivan
>>>>
>>>> symmetrical=false means the order only applies for starting
>>>
>>> From the name (symmetrical) alone it could also mean that it only applies 
> for stopping ;-)
>>> (Another example where better names would be nice)
>>
>> Well, more specifically, it only applies to the action specified in the
>> constraint. I hadn't noticed before that the second constraint here has
>> action=stop, so yes, that one would only apply for stopping.
>>
>> In the above example, the two constraints are identical to a single
>> constraint with symmetrical=true, since the second constraint is just
>> the reverse of the first.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Ordering Sets of Resources

2017-02-26 Thread Ken Gaillot
On 02/25/2017 03:35 PM, iva...@libero.it wrote:
> Hi all,
> i have configured a two node cluster on redhat 7.
> 
> Because I need to manage resources stopping and starting singularly when
> they are running I have configured cluster using order set constraints.
> 
> Here the example
> 
> Ordering Constraints:
>   Resource Sets:
> set MYIP_1 MYIP_2 MYFTP MYIP_5 action=start sequential=false
> require-all=true set MYIP_3 MYIP_4 MYSMTP action=start sequential=true
> require-all=true setoptions symmetrical=false
> set MYSMTP MYIP_4 MYIP_3 action=stop sequential=true
> require-all=true set MYIP_5 MYFTP MYIP_2 MYIP_1 action=stop
> sequential=true require-all=true setoptions symmetrical=false kind=Mandatory
> 
> The constrait work as expected on start but when stopping the resource
> don't respect the order.
> Any help is appreciated
> 
> Thank and regards
> Ivan

symmetrical=false means the order only applies for starting


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] R: Re: Ordering Sets of Resources

2017-02-27 Thread Ken Gaillot
On 02/26/2017 02:45 PM, iva...@libero.it wrote:
> Hi,
> yes, I want, with active resources, that I can turn them off individually.
> With symmetrical=true when i stop a resource, for example MYIP_4, also MYSMTP 
> will stop.
> 
> Ragards
> Ivan

I thought that was the goal, to ensure that things are stopped in order.

If your goal is to start and stop them in order *if* they're both
starting or stopping, but not *require* it, then you want kind=Optional
instead of Mandatory.

> 
> 
>> ----Messaggio originale
>> Da: "Ken Gaillot" <kgail...@redhat.com>
>> Data: 26/02/2017 20.04
>> A: <users@clusterlabs.org>
>> Ogg: Re: [ClusterLabs] Ordering Sets of Resources
>>
>> On 02/25/2017 03:35 PM, iva...@libero.it wrote:
>>> Hi all,
>>> i have configured a two node cluster on redhat 7.
>>>
>>> Because I need to manage resources stopping and starting singularly when
>>> they are running I have configured cluster using order set constraints.
>>>
>>> Here the example
>>>
>>> Ordering Constraints:
>>>   Resource Sets:
>>> set MYIP_1 MYIP_2 MYFTP MYIP_5 action=start sequential=false
>>> require-all=true set MYIP_3 MYIP_4 MYSMTP action=start sequential=true
>>> require-all=true setoptions symmetrical=false
>>> set MYSMTP MYIP_4 MYIP_3 action=stop sequential=true
>>> require-all=true set MYIP_5 MYFTP MYIP_2 MYIP_1 action=stop
>>> sequential=true require-all=true setoptions symmetrical=false 
> kind=Mandatory
>>>
>>> The constrait work as expected on start but when stopping the resource
>>> don't respect the order.
>>> Any help is appreciated
>>>
>>> Thank and regards
>>> Ivan
>>
>> symmetrical=false means the order only applies for starting

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Never join a list without a problem...

2017-02-27 Thread Ken Gaillot
On 02/27/2017 01:48 PM, Jeffrey Westgate wrote:
> I think I may be on to something.  It seems that every time my boxes start 
> showing increased host load, the preceding change that takes place is:
> 
>  crmd: info: throttle_send_command:   New throttle mode: 0100 (was 
> )
> 
> I'm attaching the last 50-odd lines from the corosync.log.  It just happens 
> that  - at the moment - our host load on this box is coming back down.  No 
> host load issue (0.00 load) immediately preceding this part of the log.
> 
> I know the log shows them in reverse order, but it shows them as the same log 
> item, and printed at the same time.  I'm assuming the throttle change takes 
> place and that increases the load, not the other way around
> 
> So - what is the throttle mode?
> 
> --
> Jeff Westgate
> DIS UNIX/Linux System Administrator

Actually it is the other way around. When Pacemaker detects high load on
a node, it "throttles" by reducing the number of operations it will
execute concurrently (to avoid making a bad situation worse).

So, what caused the load to go up is still a mystery.

There have been some cases where corosync started using 100% CPU, but
since you mentioned that processes aren't taking any more CPU, it
doesn't sound like the same issue.

> --
> Message: 3
> Date: Mon, 27 Feb 2017 13:26:30 +
> From: Jeffrey Westgate 
> To: "users@clusterlabs.org" 
> Subject: Re: [ClusterLabs] Never join a list without a problem...
> Message-ID:
> 
> 
> 
> Content-Type: text/plain; charset="us-ascii"
> 
> Thanks, Ken.
> 
> Our late guru was the admin who set all this up, and it's been rock solid 
> until recent oddities started cropping up.  They still function fine - 
> they've just developed some... quirks.
> 
> I found the solution before I got your reply, which was essentially what we 
> did; update all but pacemaker, reboot, stop pacemaker, update pacemaker, 
> reboot.  That process was necessary because they've been running sooo long, 
> pacemaker would not stop.  it would try, then seemingly stall after several 
> minutes.
> 
> We're good now, up-to-date-wise, and stuck only with the initial issue we 
> were hoping to eliminate by updating/patching EVERYthing.  And we honestly 
> don't know what may be causing it.
> 
> We use Nagios to monitor, and once every 20 to 40 hours - sometimes longer, 
> and we cannot set a clock by it - while the machine is 95% idle (or more 
> according to 'top'), the host load shoots up to 50 or 60%.  It takes about 20 
> minutes to peak, and another 30 to 45 minutes to come back down to baseline, 
> which is mostly 0.00.  (attached hostload.pdf)  This happens to both 
> machines, randomly, and is concerning, as we'd like to find what's causing it 
> and resolve it.
> 
> We were hoping "uptime kernel bug", but patching has not helped.  There seems 
> to be no increase in the number of processes running, and the processes 
> running do not take any more cpu time.  They are DNS forwarding resolvers, 
> but there is no correlation between dns requests and load increase - 
> sometimes (like this morning) it rises around 1 AM when the dns load is 
> minimal.
> 
> The oddity is - these are the only two boxes with this issue, and we have a 
> couple dozen at the same OS and level.  Only these two, with this role and 
> this particular package set have the issue.
> 
> --
> Jeff

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync/pacemaker on ~100 nodes cluser

2016-08-23 Thread Ken Gaillot
On 08/23/2016 11:46 AM, Klaus Wenninger wrote:
> On 08/23/2016 06:26 PM, Radoslaw Garbacz wrote:
>> Hi,
>>
>> I would like to ask for settings (and hardware requirements) to have
>> corosync/pacemaker running on about 100 nodes cluster.
> Actually I had thought that 16 would be the limit for full
> pacemaker-cluster-nodes.
> For larger deployments pacemaker-remote should be the way to go. Were
> you speaking of a cluster with remote-nodes?
> 
> Regards,
> Klaus
>>
>> For now some nodes get totally frozen (high CPU, high network usage),
>> so that even login is not possible. By manipulating
>> corosync/pacemaker/kernel parameters I managed to run it on ~40 nodes
>> cluster, but I am not sure which parameters are critical, how to make
>> it more responsive and how to make the number of nodes even bigger.

16 is a practical limit without special hardware and tuning, so that's
often what companies that offer support for clusters will accept.

I know people have gone well higher than 16 with a lot of optimization,
but I think somewhere between 32 and 64 corosync can't keep up with the
messages. Your 40 nodes sounds about right. I'd be curious to hear what
you had to do (with hardware, OS tuning, and corosync tuning) to get
that far.

As Klaus mentioned, Pacemaker Remote is the preferred way to go beyond
that currently:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Remote/index.html

>> Thanks,
>>
>> -- 
>> Best Regards,
>>
>> Radoslaw Garbacz
>> XtremeData Incorporation

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemakerd quits after few seconds with some errors

2016-08-22 Thread Ken Gaillot
On 08/22/2016 12:17 PM, Gabriele Bulfon wrote:
> Hi,
> 
> I built corosync/pacemaker for our XStreamOS/illumos : corosync starts
> fine and log correctly, pacemakerd quits after some seconds with the
> attached log.
> Any idea where is the issue?

Pacemaker is not able to communicate with corosync for some reason.

Aug 22 19:13:02 [1324] xstorage1 corosync notice  [MAIN  ] Corosync
Cluster Engine ('UNKNOWN'): started and ready to provide service.

'UNKNOWN' should show the corosync version. I'm wondering if maybe you
have an older corosync without configuring the pacemaker plugin. It
would be much better to use corosync 2 instead, if you can.

> 
> Thanks,
> Gabriele
> 
> 
> *Sonicle S.r.l. *: http://www.sonicle.com 
> *Music: *http://www.gabrielebulfon.com 
> *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-02 Thread Ken Gaillot
On 09/02/2016 08:14 AM, Dan Swartzendruber wrote:
> 
> So, I was testing my ZFS dual-head JBOD 2-node cluster.  Manual
> failovers worked just fine.  I then went to try an acid-test by logging
> in to node A and doing 'systemctl stop network'.  Sure enough, pacemaker
> told the APC fencing agent to power-cycle node A.  The ZFS pool moved to
> node B as expected.  As soon as node A was back up, I migrated the
> pool/IP back to node A.  I *thought* all was okay, until a bit later, I
> did 'zpool status', and saw checksum errors on both sides of several of
> the vdevs.  After much digging and poking, the only theory I could come
> up with was that maybe the fencing operation was considered complete too
> quickly?  I googled for examples using this, and the best tutorial I
> found showed using a power-wait=5, whereas the default seems to be
> power-wait=0?  (this is CentOS 7, btw...)  I changed it to use 5 instead

That's a reasonable theory -- that's why power_wait is available. It
would be nice if there were a page collecting users' experience with the
ideal power_wait for various devices. Even better if fence-agents used
those values as the defaults.

> of 0, and did a several fencing operations while a guest VM (vsphere via
> NFS) was writing to the pool.  So far, no evidence of corruption.  BTW,
> the way I was creating and managing the cluster was with the lcmc java
> gui.  Possibly the power-wait default of 0 comes from there, I can't
> really tell.  Any thoughts or ideas appreciated :)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] "VirtualDomain is active on 2 nodes" due to transient network failure

2016-09-02 Thread Ken Gaillot
On 09/01/2016 09:39 AM, Scott Greenlese wrote:
> Andreas,
> 
> You wrote:
> 
> /"Would be good to see your full cluster configuration (corosync.conf
> and cib) - but first guess is: no fencing at all  and what is your
> "no-quorum-policy" in Pacemaker?/
> 
> /Regards,/
> /Andreas"/
> 
> Thanks for your interest. I actually do have a stonith device configured
> which maps all 5 cluster nodes in the cluster:
> 
> [root@zs95kj ~]# date;pcs stonith show fence_S90HMC1
> Thu Sep 1 10:11:25 EDT 2016
> Resource: fence_S90HMC1 (class=stonith type=fence_ibmz)
> Attributes: ipaddr=9.12.35.134 login=stonith passwd=lnx4ltic
> pcmk_host_map=zs95KLpcs1:S95/KVL;zs93KLpcs1:S93/KVL;zs93kjpcs1:S93/KVJ;zs95kjpcs1:S95/KVJ;zs90kppcs1:S90/PACEMAKER
> pcmk_host_list="zs95KLpcs1 zs93KLpcs1 zs93kjpcs1 zs95kjpcs1 zs90kppcs1"
> pcmk_list_timeout=300 pcmk_off_timeout=600 pcmk_reboot_action=off
> pcmk_reboot_timeout=600
> Operations: monitor interval=60s (fence_S90HMC1-monitor-interval-60s)
> 
> This fencing device works, too well actually. It seems extremely
> sensitive to node "failures", and I'm not sure how to tune that. Stonith
> reboot actoin is 'off', and the general stonith action (cluster config)
> is also 'off'. In fact, often if I reboot a cluster node (i.e. reboot
> command) that is an active member in the cluster... stonith will power
> off that node while it's on its wait back up. (perhaps requires a
> separate issue thread on this forum?).

That depends on what a reboot does in your OS ... if it shuts down the
cluster services cleanly, you shouldn't get a fence, but if it kills
anything still running, then the cluster will see the node as failed,
and fencing is appropriate. It's considered good practice to stop
pacemaker+corosync before rebooting a node intentionally (for even more
safety, you can put the node into standby first).

> 
> My no-quorum-policy is: no-quorum-policy: stop
> 
> I don't think I should have lost quorum, only two of the five cluster
> nodes lost their corosync ring connection.

Those two nodes lost quorum, so they should have stopped all their
resources. And the three remaining nodes should have fenced them.

I'd check the logs around the time of the incident. Do the two affected
nodes detect the loss of quorum? Do they attempt to stop their
resources? Do those stops succeed? Do the other three nodes detect the
loss of the two nodes? Does the DC attempt to fence them? Do the fence
attempts succeed?

> Here's the full configuration:
> 
> 
> [root@zs95kj ~]# cat /etc/corosync/corosync.conf
> totem {
> version: 2
> secauth: off
> cluster_name: test_cluster_2
> transport: udpu
> }
> 
> nodelist {
> node {
> ring0_addr: zs93kjpcs1
> nodeid: 1
> }
> 
> node {
> ring0_addr: zs95kjpcs1
> nodeid: 2
> }
> 
> node {
> ring0_addr: zs95KLpcs1
> nodeid: 3
> }
> 
> node {
> ring0_addr: zs90kppcs1
> nodeid: 4
> }
> 
> node {
> ring0_addr: zs93KLpcs1
> nodeid: 5
> }
> }
> 
> quorum {
> provider: corosync_votequorum
> }
> 
> logging {
> #Log to a specified file
> to_logfile: yes
> logfile: /var/log/corosync/corosync.log
> #Log timestamp as well
> timestamp: on
> 
> #Facility in syslog
> syslog_facility: daemon
> 
> logger_subsys {
> #Enable debug for this logger.
> 
> debug: off
> 
> #This specifies the subsystem identity (name) for which logging is specified
> 
> subsys: QUORUM
> 
> }
> #Log to syslog
> to_syslog: yes
> 
> #Whether or not turning on the debug information in the log
> debug: on
> }
> [root@zs95kj ~]#
> 
> 
> 
> The full CIB (see attachment)
> 
> [root@zs95kj ~]# pcs cluster cib > /tmp/scotts_cib_Sep1_2016.out
> 
> /(See attached file: scotts_cib_Sep1_2016.out)/
> 
> 
> A few excerpts from the CIB:
> 
> [root@zs95kj ~]# pcs cluster cib |less
>  num_updates="19" admin_epoch="0" cib-last-written="Wed Aug 31 15:59:31
> 2016" update-origin="zs93kjpcs1" update-client="crm_resource"
> update-user="root" have-quorum="1" dc-uuid="2">
> 
> 
> 
>  value="false"/>
>  value="1.1.13-10.el7_2.ibm.1-44eb2dd"/>
>  name="cluster-infrastructure" value="corosync"/>
>  value="test_cluster_2"/>
>  name="no-quorum-policy" value="stop"/>
>  name="last-lrm-refresh" value="1472595716"/>
>  value="off"/>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  type="VirtualDomain">
> 
>  value="/guestxml/nfs1/zs95kjg109062.xml"/>
>  name="hypervisor" value="qemu:///system"/>
>  name="migration_transport" value="ssh"/>
> 
> 
>  name="allow-migrate" value="true"/>
> 
> 
>  timeout="90"/>
>  timeout="90"/>
>  name="monitor"/>
>  name="migrate-from" timeout="1200"/>
> 
> 
> 
>  value="2048"/>
> 
> 
> 
> ( I OMITTED THE OTHER, SIMILAR 199 VIRTUALDOMAIN PRIMITIVE ENTRIES FOR
> THE SAKE OF SPACE, BUT IF THEY ARE OF
> INTEREST, I CAN ADD THEM)
> 
> .
> .
> .
> 
> 
> 
> 
>  operation="eq" value="container"/>
> 
> 
> 
> (I DEFINED THIS LOCATION CONSTRAINT RULE TO PREVENT OPAQUE GUEST VIRTUAL
> DOMAIN RESOUCES FROM BEING
> ASSIGNED TO REMOTE NODE VIRTUAL DOMAIN RESOURCES. I ALSO OMITTED THE
> NUMEROUS, SIMILAR ENTRIES BELOW).
> 

Re: [ClusterLabs] fence_apc delay?

2016-09-06 Thread Ken Gaillot
On 09/05/2016 09:38 AM, Marek Grac wrote:
> Hi,
> 
> On Mon, Sep 5, 2016 at 3:46 PM, Dan Swartzendruber  > wrote:
> 
> ...
> Marek, thanks.  I have tested repeatedly (8 or so times with disk
> writes in progress) with 5-7 seconds and have had no corruption.  My
> only issue with using power_wait here (possibly I am
> misunderstanding this) is that the default action is 'reboot' which
> I *think* is 'power off, then power on'.  e.g. two operations to the
> fencing device.  The only place I need a delay though, is after the
> power off operation - doing so after power on is just wasted time
> that the resource is offline before the other node takes it over. 
> Am I misunderstanding this?  Thanks!
> 
> 
> You are right. Default sequence for reboot is:
> 
> get status, power off, delay(power-wait), get status [repeat until OFF],
> power on, delay(power-wait), get status [repeat until ON].
> 
> The power-wait was introduced because some devices respond with strange
> values when they are asked too soon after power change. It was not
> intended to be used in a way that you propose. Possible solutions:

I thought power-wait was intended for this situation, where the node's
power supply can survive a brief outage, so a delay is needed to ensure
it drains. In any case, I know people are using it for that.

Are there any drawbacks to using power-wait for this purpose, even if
that wasn't its original intent? Is it just that the "on" will get the
delay as well?

> *) Configure fence device to not use reboot but OFF, ON
> Very same to the situation when there are multiple power circuits; you
> have to switch them all OFF and afterwards turn them ON.

FYI, no special configuration is needed for this with recent pacemaker
versions. If multiple devices are listed in a topology level, pacemaker
will automatically convert reboot requests into all-off-then-all-on.

> *) Add a new option power-wait-off that will be used only in OFF case
> (and will override power-wait). It should be quite easy to do. Just,
> send us PR.
> 
> m,

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] What cib_stats line means in logfile

2016-09-06 Thread Ken Gaillot
On 09/05/2016 03:59 PM, Jan Pokorný wrote:
> On 05/09/16 21:26 +0200, Jan Pokorný wrote:
>> On 25/08/16 17:55 +0200, Sébastien Emeriau wrote:
>>> When i check my corosync.log i see this line :
>>>
>>> info: cib_stats: Processed 1 operations (1.00us average, 0%
>>> utilization) in the last 10min
>>>
>>> What does it mean (cpu load or just information) ?
>>
>> These are just periodically (10 minutes by default, if any
>> operations observed at all) emitted diagnostic summaries that
>> were once considered useful, which was later reconsidered
>> leading to their complete removal:
>>
>> https://github.com/ClusterLabs/pacemaker/commit/73e8c89#diff-37b681fa792dfc09ec67bb0d64eb55feL306
>>
>> Honestly, using as old Pacemaker as 1.1.8 (released 4 years ago)
> 
> actually, it must have been even older than that (I'm afraid to ask).
> 
>> would be a bigger concern for me.  Plenty of important fixes
>> (as well as enhancements) have been added since then...
> 
> P.S. Checked my mailbox, aggregating plentiful sources such as this
> list and various GitHub notifications, and found 1 other trace of
> such an oudated version within this year + 2 another last year(!).

My guess is Debian -- the pacemaker package stagnated in Debian for a
long time, so the stock Debian packages were at 1.1.7 as late as wheezy
(initially released in 2013, but it's LTS until 2018). Then, pacemaker
was dropped entirely from jessie.

Recent versions are once again actively maintained in Debian
backports/unstable, so the situation should improve from here on out,
but I bet a lot of Debian boxes still run wheezy or earlier.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_apc delay?

2016-09-06 Thread Ken Gaillot
On 09/06/2016 10:20 AM, Dan Swartzendruber wrote:
> On 2016-09-06 10:59, Ken Gaillot wrote:
> 
> [snip]
> 
>> I thought power-wait was intended for this situation, where the node's
>> power supply can survive a brief outage, so a delay is needed to ensure
>> it drains. In any case, I know people are using it for that.
>>
>> Are there any drawbacks to using power-wait for this purpose, even if
>> that wasn't its original intent? Is it just that the "on" will get the
>> delay as well?
> 
> I can't speak to the first part of your question, but for me the second
> part is a definite YES.  The issue is that I want a long enough delay to
> be sure the host is D E A D and not writing to the pool anymore; but
> that delay is now multiplied by 2, and if it gets "too long", vsphere
> guests can start getting disk I/O errors...

Ah, Marek's suggestions are the best way out, then. Fence agents are
usually simple shell scripts, so adding a power-wait-off option
shouldn't be difficult.

>>> *) Configure fence device to not use reboot but OFF, ON
>>> Very same to the situation when there are multiple power circuits; you
>>> have to switch them all OFF and afterwards turn them ON.
>>
>> FYI, no special configuration is needed for this with recent pacemaker
>> versions. If multiple devices are listed in a topology level, pacemaker
>> will automatically convert reboot requests into all-off-then-all-on.
> 
> My understanding was that applied to 1.1.14?  My CentOS 7 host has
> pacemaker 1.1.13 :(

Correct -- but most OS distributions, including CentOS, backport
specific bugfixes and features from later versions. In this case, as
long as you've applied updates (pacemaker-1.1.13-10 or later), you've
got it.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ip clustering strange behaviour

2016-08-31 Thread Ken Gaillot
On 08/30/2016 01:52 AM, Gabriele Bulfon wrote:
> Sorry for reiterating, but my main question was:
> 
> why does node 1 removes its own IP if I shut down node 2 abruptly?
> I understand that it does not take the node 2 IP (because the
> ssh-fencing has no clue about what happened on the 2nd node), but I
> wouldn't expect it to shut down its own IP...this would kill any service
> on both nodes...what am I wrong?

Assuming you're using corosync 2, be sure you have "two_node: 1" in
corosync.conf. That will tell corosync to pretend there is always
quorum, so pacemaker doesn't need any special quorum settings. See the
votequorum(5) man page for details. Of course, you need fencing in this
setup, to handle when communication between the nodes is broken but both
are still up.

> 
> *Sonicle S.r.l. *: http://www.sonicle.com 
> *Music: *http://www.gabrielebulfon.com 
> *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon
> 
> 
> 
> 
> *Da:* Gabriele Bulfon 
> *A:* kwenn...@redhat.com Cluster Labs - All topics related to
> open-source clustering welcomed 
> *Data:* 29 agosto 2016 17.37.36 CEST
> *Oggetto:* Re: [ClusterLabs] ip clustering strange behaviour
> 
> 
> Ok, got it, I hadn't gracefully shut pacemaker on node2.
> Now I restarted, everything was up, stopped pacemaker service on
> host2 and I got host1 with both IPs configured. ;)
> 
> But, though I understand that if I halt host2 with no grace shut of
> pacemaker, it will not move the IP2 to Host1, I don't expect host1
> to loose its own IP! Why?
> 
> Gabriele
> 
> 
> 
> *Sonicle S.r.l. *: http://www.sonicle.com 
> *Music: *http://www.gabrielebulfon.com 
> *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon
> 
> 
> 
> 
> --
> 
> Da: Klaus Wenninger 
> A: users@clusterlabs.org
> Data: 29 agosto 2016 17.26.49 CEST
> Oggetto: Re: [ClusterLabs] ip clustering strange behaviour
> 
> On 08/29/2016 05:18 PM, Gabriele Bulfon wrote:
> > Hi,
> >
> > now that I have IPaddr work, I have a strange behaviour on my test
> > setup of 2 nodes, here is my configuration:
> >
> > ===STONITH/FENCING===
> >
> > primitive xstorage1-stonith stonith:external/ssh-sonicle op
> monitor
> > interval="25" timeout="25" start-delay="25" params
> hostlist="xstorage1"
> >
> > primitive xstorage2-stonith stonith:external/ssh-sonicle op
> monitor
> > interval="25" timeout="25" start-delay="25" params
> hostlist="xstorage2"
> >
> > location xstorage1-stonith-pref xstorage1-stonith -inf: xstorage1
> > location xstorage2-stonith-pref xstorage2-stonith -inf: xstorage2
> >
> > property stonith-action=poweroff
> >
> >
> >
> > ===IP RESOURCES===
> >
> >
> > primitive xstorage1_wan1_IP ocf:heartbeat:IPaddr params
> ip="1.2.3.4"
> > cidr_netmask="255.255.255.0" nic="e1000g1"
> > primitive xstorage2_wan2_IP ocf:heartbeat:IPaddr params
> ip="1.2.3.5"
> > cidr_netmask="255.255.255.0" nic="e1000g1"
> >
> > location xstorage1_wan1_IP_pref xstorage1_wan1_IP 100: xstorage1
> > location xstorage2_wan2_IP_pref xstorage2_wan2_IP 100: xstorage2
> >
> > ===
> >
> > So I plumbed e1000g1 with unconfigured IP on both machines and
> started
> > corosync/pacemaker, and after some time I got all nodes online and
> > started, with IP configured as virtual interfaces (e1000g1:1 and
> > e1000g1:2) one in host1 and one in host2.
> >
> > Then I halted host2, and I expected to have host1 started with
> both
> > IPs configured on host1.
> > Instead, I got host1 started with the IP stopped and removed (only
> > e1000g1 unconfigured), host2 stopped saying IP started (!?).
> > Not exactly what I expected...
> > What's wrong?
> 
> How did you stop host2? Graceful shutdown of pacemaker? If not ...
> Anyway ssh-fencing is just working if the machine is still
> running ...
> So it will stay unclean and thus pacemaker is thinking that
> the IP might still be running on it. So this is actually the
> expected
> behavior.
> You might add a watchdog via sbd if you 

Re: [ClusterLabs] data loss of network would cause Pacemaker exit abnormally

2016-08-31 Thread Ken Gaillot
On 08/30/2016 01:58 PM, chenhj wrote:
> Hi,
> 
> This is a continuation of the email below(I did not subscrib this maillist)
> 
> http://clusterlabs.org/pipermail/users/2016-August/003838.html
> 
>>>From the above, I suspect that the node with the network loss was the
>>DC, and from its point of view, it was the other node that went away.
> 
> Yes. the node with the network loss was DC(node2)
> 
> Could someone explain what's the following messges means, and 
> why pacemakerd process exit instead of rejoin to CPG group?
> 
>> Aug 27 12:33:59 [46849] node3 pacemakerd:error: pcmk_cpg_membership:
>>We're not part of CPG group 'pacemakerd' anymore!

This means the node was kicked out of the membership. I don't remember
what that implies, I'm guessing the node exits because the cluster will
most likely fence it after kicking it out.

> 
>>> [root at node3 ~]# rpm -q corosync
>>> corosync-1.4.1-7.el6.x86_64
>>That is quite old ...
>>> [root at node3 ~]# cat /etc/redhat-release 
>>> CentOS release 6.3 (Final)
>>> [root at node3 ~]# pacemakerd -F
>> Pacemaker 1.1.14-1.el6 (Build: 70404b0)
>>and I doubt that many people have tested Pacemaker 1.1.14 against
>>corosync 1.4.1 ... quite far away from
>>each other release-wise ...
> 
> pacemaker 1.1.14 + corosync-1.4.7 can also reproduced this probleam, but
> seems with lower probability.

The corosync 2 series is a major improvement, but some config changes
are necessary


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] systemd RA start/stop delays

2016-08-31 Thread Ken Gaillot
On 08/30/2016 05:18 AM, Dejan Muhamedagic wrote:
> Hi,
> 
> On Thu, Aug 18, 2016 at 09:00:24AM -0500, Ken Gaillot wrote:
>> On 08/17/2016 08:17 PM, TEG AMJG wrote:
>>> Hi
>>>
>>> I am having a problem with a simple Active/Passive cluster which
>>> consists in the next configuration
>>>
>>> Cluster Name: kamcluster
>>> Corosync Nodes:
>>>  kam1vs3 kam2vs3
>>> Pacemaker Nodes:
>>>  kam1vs3 kam2vs3
>>>
>>> Resources:
>>>  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>>>   Attributes: ip=10.0.1.206 cidr_netmask=32
>>>   Operations: start interval=0s timeout=20s (ClusterIP-start-interval-0s)
>>>   stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
>>>   monitor interval=10s (ClusterIP-monitor-interval-10s)
>>>  Resource: ClusterIP2 (class=ocf provider=heartbeat type=IPaddr2)
>>>   Attributes: ip=10.0.1.207 cidr_netmask=32
>>>   Operations: start interval=0s timeout=20s (ClusterIP2-start-interval-0s)
>>>   stop interval=0s timeout=20s (ClusterIP2-stop-interval-0s)
>>>   monitor interval=10s (ClusterIP2-monitor-interval-10s)
>>>  Resource: rtpproxycluster (class=systemd type=rtpproxy)
>>>   Operations: monitor interval=10s (rtpproxycluster-monitor-interval-10s)
>>>   stop interval=0s on-fail=block
>>> (rtpproxycluster-stop-interval-0s)
>>>  Resource: kamailioetcfs (class=ocf provider=heartbeat type=Filesystem)
>>>   Attributes: device=/dev/drbd1 directory=/etc/kamailio fstype=ext4
>>>   Operations: start interval=0s timeout=60 (kamailioetcfs-start-interval-0s)
>>>   monitor interval=10s on-fail=fence
>>> (kamailioetcfs-monitor-interval-1  
>>>0s)
>>>   stop interval=0s on-fail=fence
>>> (kamailioetcfs-stop-interval-0s)
>>>  Clone: fence_kam2_xvm-clone
>>>   Meta Attrs: interleave=true clone-max=2 clone-node-max=1
>>>   Resource: fence_kam2_xvm (class=stonith type=fence_xvm)
>>>Attributes: port=tegamjg_kam2 pcmk_host_list=kam2vs3
>>>Operations: monitor interval=60s (fence_kam2_xvm-monitor-interval-60s)
>>>  Master: kamailioetcclone
>>>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
>>> clone-node-max=1 notify=t  
>>>rue on-fail=fence
>>>   Resource: kamailioetc (class=ocf provider=linbit type=drbd)
>>>Attributes: drbd_resource=kamailioetc
>>>Operations: start interval=0s timeout=240 (kamailioetc-start-interval-0s)
>>>promote interval=0s on-fail=fence
>>> (kamailioetc-promote-interval-0s)
>>>demote interval=0s on-fail=fence
>>> (kamailioetc-demote-interval-0s)
>>>stop interval=0s on-fail=fence (kamailioetc-stop-interval-0s)
>>>monitor interval=10s (kamailioetc-monitor-interval-10s)
>>>  Clone: fence_kam1_xvm-clone
>>>   Meta Attrs: interleave=true clone-max=2 clone-node-max=1
>>>   Resource: fence_kam1_xvm (class=stonith type=fence_xvm)
>>>Attributes: port=tegamjg_kam1 pcmk_host_list=kam1vs3
>>>Operations: monitor interval=60s (fence_kam1_xvm-monitor-interval-60s)
>>>  Resource: kamailiocluster (class=ocf provider=heartbeat type=kamailio)
>>>   Attributes: listen_address=10.0.1.206
>>> conffile=/etc/kamailio/kamailio.cfg pidfil  
>>>  
>>>  e=/var/run/kamailio.pid monitoring_ip=10.0.1.206
>>> monitoring_ip2=10.0.1.207 port=50  
>>>60 proto=udp
>>> kamctlrc=/etc/kamailio/kamctlrc shmem=128 pkg=8
>>>   Meta Attrs: target-role=Stopped
>>>   Operations: start interval=0s timeout=60
>>> (kamailiocluster-start-interval-0s)
>>>   stop interval=0s timeout=30 (kamailiocluster-stop-interval-0s)
>>>   monitor interval=5s (kamailiocluster-monitor-interval-5s)
>>>
>>> Stonith Devices:
>>> Fencing Levels:
>>>
>>> Location Constraints:
>>> Ordering Constraints:
>>>   start fence_kam1_xvm-clone then start fence_kam2_xvm-clone
>>> (kind:Mandatory) (id:  
>>>  
>>>  order-f

Re: [ClusterLabs] When does Pacemaker shoot other nodes in the head

2016-09-09 Thread Ken Gaillot
On 09/09/2016 08:52 AM, Auer, Jens wrote:
> Hi,
> 
> a client asked me to describe the conditions when Pacemaker uses STONITH
> to bring the cluster into a known state. The documentation says that
> this happens when "we cannot establish with certainty a state of some
> node or resource", but I need some more concrete explanations.
> Specifically, he is wondering what happens when
> 1. a resource, e.g. a virtual ip, fails to often
> 2. the heartbeats of one of the cluster nodes are not received anymore
> 3. combinations of these two
> 
> Is there some better definition of the conditions which trigger STONITH?

To state the obvious, just for completeness: Before stonith/fencing can
be used at all, stonith-enabled must be true, working fence devices must
be configured, and each node must be able to be targeted by at least one
fence device. If fence device failure is a concern, a fencing topology
should be used with multiple devices. The fencing setup should be
verified by testing before going into production.

Assuming that's in place, fencing can be used in the following situations:

* Most importantly, if corosync communication is broken between nodes,
fencing will be attempted. If no-quorum-policy=ignore, each partition
will attempt to fence the other (in two-node clusters, a fence delay is
commonly used on one node to avoid a death match here); otherwise, the
partition with quorum will try to fence the partition without quorum.
This can happen due to a node or nodes crashing, being under extreme
load, losing network connectivity, etc. Options in corosync.conf can
affect how long it takes to detect an outage, etc.

* If no-quorum-policy=suicide, and one or more nodes are separated from
the rest of the cluster such that they lose quorum, they will fence
themselves.

* If startup-fencing=true (the default), and some nodes are not present
when the cluster first starts, those nodes will be fenced.

* If a resource operation has on-fail=fence, and it fails, the cluster
will fence the node that had the failure. Note that on-fail defaults to
fence for stop operations, since if we can't stop a resource, we can't
recover it elsewhere.

* If someone/something explicitly requests fencing via the stonithd API
(for example, "stonith_admin -F "), then of course the node will
be fenced. Some software, such as DRBD and DLM, can be configured to use
pacemaker's fencing, so fencing might be triggered by them under their
own conditions.

* In a multi-site cluster using booth, if a ticket constraint has
loss-policy=fence and the ticket is lost, the cluster will fence the
nodes that were running the resources associated with the ticket.

I may be forgetting some, but that's the most important.

How the cluster responds to resource-level failures (as opposed to
losing an entire node) depends on the configuration, but unless you've
configured on-fail=fence, fencing won't be involved. See the
documentation for the migration-threshold and on-fail parameters.

> Best wishes,
>   jens
> 
> --
> *Jens Auer *| CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> _jens.auer@cgi.com_ 
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie
> unter _de.cgi.com/pflichtangaben_ .

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker quorum behavior

2016-09-09 Thread Ken Gaillot
On 09/09/2016 04:27 AM, Klaus Wenninger wrote:
> On 09/08/2016 07:31 PM, Scott Greenlese wrote:
>>
>> Hi Klaus, thanks for your prompt and thoughtful feedback...
>>
>> Please see my answers nested below (sections entitled, "Scott's
>> Reply"). Thanks!
>>
>> - Scott
>>
>>
>> Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y.
>> INTERNET: swgre...@us.ibm.com
>> PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966
>>
>>
>> Inactive hide details for Klaus Wenninger ---09/08/2016 10:59:27
>> AM---On 09/08/2016 03:55 PM, Scott Greenlese wrote: >Klaus Wenninger
>> ---09/08/2016 10:59:27 AM---On 09/08/2016 03:55 PM, Scott Greenlese
>> wrote: >
>>
>> From: Klaus Wenninger 
>> To: users@clusterlabs.org
>> Date: 09/08/2016 10:59 AM
>> Subject: Re: [ClusterLabs] Pacemaker quorum behavior
>>
>> 
>>
>>
>>
>> On 09/08/2016 03:55 PM, Scott Greenlese wrote:
>> >
>> > Hi all...
>> >
>> > I have a few very basic questions for the group.
>> >
>> > I have a 5 node (Linux on Z LPARs) pacemaker cluster with 100
>> > VirtualDomain pacemaker-remote nodes
>> > plus 100 "opaque" VirtualDomain resources. The cluster is configured
>> > to be 'symmetric' and I have no
>> > location constraints on the 200 VirtualDomain resources (other than to
>> > prevent the opaque guests
>> > from running on the pacemaker remote node resources). My quorum is set
>> > as:
>> >
>> > quorum {
>> > provider: corosync_votequorum
>> > }
>> >
>> > As an experiment, I powered down one LPAR in the cluster, leaving 4
>> > powered up with the pcsd service up on the 4 survivors
>> > but corosync/pacemaker down (pcs cluster stop --all) on the 4
>> > survivors. I then started pacemaker/corosync on a single cluster
>> >
>>
>> "pcs cluster stop" shuts down pacemaker & corosync on my test-cluster but
>> did you check the status of the individual services?
>>
>> Scott's reply:
>>
>> No, I only assumed that pacemaker was down because I got this back on
>> my pcs status
>> command from each cluster node:
>>
>> [root@zs95kj VD]# date;for host in zs93KLpcs1 zs95KLpcs1 zs95kjpcs1
>> zs93kjpcs1 ; do ssh $host pcs status; done
>> Wed Sep 7 15:49:27 EDT 2016
>> Error: cluster is not currently running on this node
>> Error: cluster is not currently running on this node
>> Error: cluster is not currently running on this node
>> Error: cluster is not currently running on this node

In my experience, this is sufficient to say that pacemaker and corosync
aren't running.

>>
>> What else should I check?  The pcsd.service service was still up,
>> since I didn't not stop that
>> anywhere. Should I have done,  ps -ef |grep -e pacemaker -e corosync
>>  to check the state before
>> assuming it was really down?
>>
>>
> Guess the answer from Poki should guide you well here ...
>>
>>
>> > node (pcs cluster start), and this resulted in the 200 VirtualDomain
>> > resources activating on the single node.
>> > This was not what I was expecting. I assumed that no resources would
>> > activate / start on any cluster nodes
>> > until 3 out of the 5 total cluster nodes had pacemaker/corosync running.

Your expectation is correct; I'm not sure what happened in this case.
There are some obscure corosync options (e.g. last_man_standing,
allow_downscale) that could theoretically lead to this, but I don't get
the impression you're using anything unusual.

>> > After starting pacemaker/corosync on the single host (zs95kjpcs1),
>> > this is what I see :
>> >
>> > [root@zs95kj VD]# date;pcs status |less
>> > Wed Sep 7 15:51:17 EDT 2016
>> > Cluster name: test_cluster_2
>> > Last updated: Wed Sep 7 15:51:18 2016 Last change: Wed Sep 7 15:30:12
>> > 2016 by hacluster via crmd on zs93kjpcs1
>> > Stack: corosync
>> > Current DC: zs95kjpcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -
>> > partition with quorum
>> > 106 nodes and 304 resources configured
>> >
>> > Node zs93KLpcs1: pending
>> > Node zs93kjpcs1: pending
>> > Node zs95KLpcs1: pending
>> > Online: [ zs95kjpcs1 ]
>> > OFFLINE: [ zs90kppcs1 ]
>> >
>> > .
>> > .
>> > .
>> > PCSD Status:
>> > zs93kjpcs1: Online
>> > zs95kjpcs1: Online
>> > zs95KLpcs1: Online
>> > zs90kppcs1: Offline
>> > zs93KLpcs1: Online

FYI the Online/Offline above refers only to pcsd, which doesn't have any
effect on the cluster itself -- just the ability to run pcs commands.

>> > So, what exactly constitutes an "Online" vs. "Offline" cluster node
>> > w.r.t. quorum calculation? Seems like in my case, it's "pending" on 3
>> > nodes,
>> > so where does that fall? Any why "pending"? What does that mean?

"pending" means that the node has joined the corosync cluster (which
allows it to contribute to quorum), but it has not yet completed the
pacemaker join process (basically a handshake with the DC).

I think the corosync and pacemaker detail logs would be essential to
figuring out what's going on. Check the logs on the "pending" nodes to
see whether corosync somehow started up by this 

Re: [ClusterLabs] "VirtualDomain is active on 2 nodes" due to transient network failure

2016-09-09 Thread Ken Gaillot
On 09/09/2016 02:47 PM, Scott Greenlese wrote:
> Hi Ken ,
> 
> Below where you commented,
> 
> "It's considered good practice to stop
> pacemaker+corosync before rebooting a node intentionally (for even more
> safety, you can put the node into standby first)."
> 
> .. is this something that we document anywhere?

Not in any official documentation that I'm aware of; it's more a general
custom than a strong recommendation.

> Our 'reboot' action performs a halt (deactivate lpar) and then activate.
> Do I run the risk
> of guest instances running on multiple hosts in my case? I'm performing
> various recovery
> scenarios and want to avoid this procedure (reboot without first
> stopping cluster), if it's not supported.

By "intentionally" I mean via normal system administration, not fencing.
When fencing, it's always acceptable (and desirable) to do an immediate
cutoff, without any graceful stopping of anything.

When doing a graceful reboot/shutdown, the OS typically asks all running
processes to terminate, then waits a while for them to do so. There's
nothing really wrong with pacemaker being running at that point -- as
long as everything goes well.

If the OS gets impatient and terminates pacemaker before it finishes
stopping, the rest of the cluster will want to fence the node. Also, if
something goes wrong when resources are stopping, it might be harder to
troubleshoot, if the whole system is shutting down at the same time. So,
stopping pacemaker first makes sure that all the resources stop cleanly,
and that the cluster will ignore the node.

Putting in standby is not as important, I would say the main benefit is
that the node comes back up in standby when it rejoins, so you have more
control over when resources start being placed back on it. You can bring
up the node and start pacemaker, and make sure everything is good before
allowing resources back on it (especially helpful if you just upgraded
pacemaker or any of its dependencies, changed the host's network
configuration, etc.).

There shouldn't be any chance of multiple-active instances if fencing is
configured. Pacemaker shouldn't recover the resource elsewhere until it
confirms that either the resource stopped successfully on the node, or
the node was fenced.


> 
> By the way, I always put the node in cluster standby before an
> intentional reboot.
> 
> Thanks!
> 
> Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y.
> INTERNET: swgre...@us.ibm.com
> PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966
> 
> 
> Inactive hide details for Ken Gaillot ---09/02/2016 10:01:15 AM---From:
> Ken Gaillot <kgail...@redhat.com> To: users@clusterlabsKen Gaillot
> ---09/02/2016 10:01:15 AM---From: Ken Gaillot <kgail...@redhat.com> To:
> users@clusterlabs.org
> 
> From: Ken Gaillot <kgail...@redhat.com>
> To: users@clusterlabs.org
> Date: 09/02/2016 10:01 AM
> Subject: Re: [ClusterLabs] "VirtualDomain is active on 2 nodes" due to
> transient network failure
> 
> 
> 
> 
> 
> On 09/01/2016 09:39 AM, Scott Greenlese wrote:
>> Andreas,
>>
>> You wrote:
>>
>> /"Would be good to see your full cluster configuration (corosync.conf
>> and cib) - but first guess is: no fencing at all  and what is your
>> "no-quorum-policy" in Pacemaker?/
>>
>> /Regards,/
>> /Andreas"/
>>
>> Thanks for your interest. I actually do have a stonith device configured
>> which maps all 5 cluster nodes in the cluster:
>>
>> [root@zs95kj ~]# date;pcs stonith show fence_S90HMC1
>> Thu Sep 1 10:11:25 EDT 2016
>> Resource: fence_S90HMC1 (class=stonith type=fence_ibmz)
>> Attributes: ipaddr=9.12.35.134 login=stonith passwd=lnx4ltic
>>
> pcmk_host_map=zs95KLpcs1:S95/KVL;zs93KLpcs1:S93/KVL;zs93kjpcs1:S93/KVJ;zs95kjpcs1:S95/KVJ;zs90kppcs1:S90/PACEMAKER
>> pcmk_host_list="zs95KLpcs1 zs93KLpcs1 zs93kjpcs1 zs95kjpcs1 zs90kppcs1"
>> pcmk_list_timeout=300 pcmk_off_timeout=600 pcmk_reboot_action=off
>> pcmk_reboot_timeout=600
>> Operations: monitor interval=60s (fence_S90HMC1-monitor-interval-60s)
>>
>> This fencing device works, too well actually. It seems extremely
>> sensitive to node "failures", and I'm not sure how to tune that. Stonith
>> reboot actoin is 'off', and the general stonith action (cluster config)
>> is also 'off'. In fact, often if I reboot a cluster node (i.e. reboot
>> command) that is an active member in the cluster... stonith will power
>> off that node while it's on its wait back up. (perhaps requires a
>> separate issue thread on this forum?).
> 
> That depends on what a reboot does in your OS ... if i

Re: [ClusterLabs] Cold star of one node only

2016-09-13 Thread Ken Gaillot
On 09/13/2016 03:27 PM, Gienek Nowacki wrote:
> Hi,
> 
> I'm still testing (before production running) the solution with
> pacemaker+corosync+drbd+dlm+gfs2 on Centos7 with double-primary config.
> 
> I have two nodes: wirt1v and wirt2v - each node contains LVM partition 
> with DRBD (/dev/drbd2) and filesystem mounted as /virtfs2. Filesystems
> /virtfs2 contain the images of virtual machines.
> 
> My problem is so - I can't start the cluster and the resources on one
> node only (cold start) when the second node is completely powered off.

"two_node: 1" implies "wait_for_all: 1" in corosync.conf; see the
votequorum(5) man page for details.

This is a safeguard against the situation where the other node is up,
but not reachable from the newly starting node.

You can get around this by setting "wait_for_all: 0", and rely on
pacemaker's fencing to resolve that situation. But if so, be careful
about starting pacemaker when the nodes can't see each other, because
each will try to fence the other.

Example: wirt1v's main LAN network port gets fried in an electrical
surge, but its iDRAC network port is still operational. wirt2v may
successfully fence wirt1v and take over all resources, but if wirt1v is
rebooted and starts pacemaker, without wait_for_all it will fence wirt2v.

> Is it in such configuration at all posssible - is it posible to start
> one node only?
> 
> Could you help me, please?
> 
> The  configs and log (during cold start)  are attached.
> 
> Thanks in advance,
> Gienek Nowacki
> 
> ==
> 
> #-
> ### result:  cat /etc/redhat-release  ###
> 
> CentOS Linux release 7.2.1511 (Core)
> 
> #-
> ### result:  uname -a  ###
> 
> Linux wirt1v.example.com 
> 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC 2016 x86_64
> x86_64 x86_64 GNU/Linux
> 
> #-
> ### result:  cat /etc/hosts  ###
> 
> 127.0.0.1   localhost localhost.localdomain localhost4
> localhost4.localdomain4
> 172.31.0.23 wirt1.example.com  wirt1
> 172.31.0.24 wirt2.example.com  wirt2
> 1.1.1.1 wirt1v.example.com  wirt1v
> 1.1.1.2 wirt2v.example.com  wirt2v
> 
> #-
> ### result:  cat /etc/drbd.conf  ###
> 
> include "drbd.d/global_common.conf";
> include "drbd.d/*.res";
> 
> #-
> ### result:  cat /etc/drbd.d/global_common.conf  ###
> 
> common {
> protocol C;
> syncer {
> verify-alg sha1;
> }
> startup {
> become-primary-on both;
> wfc-timeout 30;
> outdated-wfc-timeout 20;
> degr-wfc-timeout 30;
> }
> disk {
> fencing resource-and-stonith;
> }
> handlers {
> fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
> split-brain
> "/usr/lib/drbd/notify-split-brain.sh linuxad...@example.com
> ";
> pri-lost-after-sb  
> "/usr/lib/drbd/notify-split-brain.sh linuxad...@example.com
> ";
> out-of-sync
> "/usr/lib/drbd/notify-out-of-sync.sh linuxad...@example.com
> ";
> local-io-error 
> "/usr/lib/drbd/notify-io-error.shlinuxad...@example.com
> ";
> }
> net {
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
> }
> }
> 
> #-
> ### result:  cat /etc/drbd.d/drbd2.res  ###
> 
> resource drbd2 {
> meta-disk internal;
> device /dev/drbd2;
> on wirt1v.example.com  {
> disk /dev/vg02/drbd2;
> address 1.1.1.1:7782 ;
> }
> on wirt2v.example.com  {
> disk /dev/vg02/drbd2;
> address 1.1.1.2:7782 ;
> }
> }
> 
> #-
> ### result:  cat /etc/corosync/corosync.conf  ###
> 
> totem {
> version: 2
> secauth: off
> cluster_name: klasterek
> transport: udpu
> }
> nodelist {
> node {
> ring0_addr: wirt1v
> nodeid: 1
> }
> node {
> ring0_addr: wirt2v
> nodeid: 2
> }
> }
> quorum {
> provider: corosync_votequorum
> two_node: 1
> }
> logging {
> to_logfile: yes
> logfile: /var/log/cluster/corosync.log
> to_syslog: yes
> }
> 
> 

<    1   2   3   4   5   6   7   8   9   10   >