Re: [ClusterLabs] Access denied when using Floating IP

2017-01-06 Thread Ken Gaillot
On 12/26/2016 12:03 AM, Kaushal Shriyan wrote:
> Hi,
> 
> I have set up Highly Available HAProxy Servers with Keepalived and
> Floating IP.  I have the below details 
> 
> *Master Node keepalived.conf*
> 
> global_defs {
> # Keepalived process identifier
> #lvs_id haproxy_DH
> }
> # Script used to check if HAProxy is running
> vrrp_script check_haproxy {
> script "/usr/bin/killall -0 haproxy"
> interval 2
> weight 2
> }
> # Virtual interface
> # The priority specifies the order in which the assigned interface to
> take over in a failover
> vrrp_instance VI_01 {
> state MASTER
> interface eth0
> virtual_router_id 51
> priority 200
> # The virtual ip address shared between the two loadbalancers
> virtual_ipaddress {
> *172.16.0.75/32 *
> }
> track_script {
> check_haproxy
> }
> }
> 
> *Slave Node keepalived.conf*
> 
> global_defs {
> # Keepalived process identifier
> #lvs_id haproxy_DH_passive
> }
> # Script used to check if HAProxy is running
> vrrp_script check_haproxy {
> script "/usr/bin/killall -0 haproxy"
> interval 2
> weight 2
> }
> # Virtual interface
> # The priority specifies the order in which the assigned interface to
> take over in a failover
> vrrp_instance VI_01 {
> state BACKUP
> interface eth0
> virtual_router_id 51
> priority 100
> # The virtual ip address shared between the two loadbalancers
> virtual_ipaddress {
> 172.16.0.75/32 
> }
> track_script {
> check_haproxy
> }
> }
> 
> HAProxy Node 1 has two IP Addresses
> 
> eth0 :- 172.16.0.20 LAN IP of the box Master Node
> eth0 :- 172.16.0.75 Virtual IP
> 
> eth0 :- 172.16.0.21 LAN IP of the box Slave Node
> 
> In MySQL server, i have given access for the Floating IP :- 172.16.0.75
> 
> *GRANT USAGE ON *.* TO 'haproxy_check'@'172.16.0.75';
> *
> *GRANT ALL PRIVILEGES ON *.* TO 'haproxy_root'@'172.16.0.75' IDENTIFIED
> BY PASSWORD '*7A3F28E9F3E3AEFDFF87BCFE119DCF830101DD71' WITH GRANT OPTION;*
> 
> When i try to connect to the MySQL server using floating IP :- 172.16.0.75,
> I get access denied inspite of giving grant access as per the above
> mentioned command. When i try to use the static IP to connect to the
> MySQL server using LAN IP :- 172.16.0.20, it works as expected. is it
> because eth0 has two IPs :- 172.16.0.20 and 172.16.0.75?
> 
> Please do let me know if you need any additional information.
> 
> Regards,
> 
> Kaushal

People on this list tend to be more familiar with pacemaker clusters
than keepalived, but my guess is that mysql's privileges apply to the IP
address that the user is connecting *from*. Try giving the same
privileges to the user at all other local IPs (or @'%' if you don't mind
allowing connections from anywhere, and use a firewall to block unwanted
connections instead).

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] centos 7 drbd fubar

2017-01-06 Thread Ken Gaillot
On 12/27/2016 03:08 PM, Dimitri Maziuk wrote:
> I ran centos 7.3.1611 update over the holidays and my drbd + nfs + imap
> active-passive pair locked up again. This has now been consistent for at
> least 3 kernel updates. This time I had enough consoles open to run
> fuser & lsof though.
> 
> The procedure:
> 
> 1. pcs cluster standby 
> 2. yum up && reboot 
> 3. pcs cluster unstandby 
> 
> Fine so far.
> 
> 4. pcs cluster standby 
> results in
> 
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:41 INFO: Running 
>> stop for /dev/drbd0 on /raid
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:41 INFO: Trying to 
>> unmount /raid
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:41 ERROR: Couldn't 
>> unmount /raid; trying cleanup with TERM
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:41 INFO: No 
>> processes on /raid were signalled. force_unmount is set to 'yes'
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:42 ERROR: Couldn't 
>> unmount /raid; trying cleanup with TERM
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:42 INFO: No 
>> processes on /raid were signalled. force_unmount is set to 'yes'
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:43 ERROR: Couldn't 
>> unmount /raid; trying cleanup with TERM
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:43 INFO: No 
>> processes on /raid were signalled. force_unmount is set to 'yes'
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:44 ERROR: Couldn't 
>> unmount /raid; trying cleanup with KILL
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:44 INFO: No 
>> processes on /raid were signalled. force_unmount is set to 'yes'
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:45 ERROR: Couldn't 
>> unmount /raid; trying cleanup with KILL
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:46 INFO: No 
>> processes on /raid were signalled. force_unmount is set to 'yes'
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:47 ERROR: Couldn't 
>> unmount /raid; trying cleanup with KILL
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:47 INFO: No 
>> processes on /raid were signalled. force_unmount is set to 'yes'
>> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:48 ERROR: Couldn't 
>> unmount /raid, giving up!
>> Dec 23 17:36:48 [1138] zebrafish.bmrb.wisc.edu   lrmd:   notice: 
>> operation_finished:drbd_filesystem_stop_0:18277:stderr [ umount: 
>> /raid: target i
>> s busy. ]
> 
> ... until the system's powered down. Before power down I ran lsof, it
> hung, and fuser:
> 
>> # fuser -vum /raid
>>  USERPID ACCESS COMMAND
>> /raid:   root kernel mount (root)/raid
> 
> After running yum up on the primary and rebooting it again,
> 
> 5. pcs cluster unstandby 
> causes the same fail to unmount loop on the secondary, that has to be
> powered down until the primary recovers.
> 
> Hopefully I'm doing something wrong, please someone tell me what it is.
> Anyone? Bueller?

That is disconcerting. Since no one here seems to know, have you tried
asking on the drbd list? It sounds like an issue with the drbd kernel
module.

http://lists.linbit.com/listinfo/drbd-user


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Status and help with pgsql RA

2017-01-06 Thread Ken Gaillot
On 12/28/2016 02:24 PM, Nils Carlson wrote:
> Hi,
> 
> I am looking to set up postgresql in high-availability and have been
> comparing the guide at
> http://wiki.clusterlabs.org/wiki/PgSQL_Replicated_Cluster with the
> contents of the pgsql resource agent on github. It seems that there have
> been substantial improvements in the resource agent regarding the use of
> replication slots.
> 
> Could anybody look at updating the guide, or just sending it out in an
> e-mail to help spread knowledge?
> 
> The replications slots with pacemaker look really cool, if I've
> understood things right there should be no need for manual work after
> node recovery with the replication slots (though there is a risk of a
> full disk)?
> 
> All help, tips and guidance much appreciated.
> 
> Cheers,
> Nils

Hmm, that wiki page could definitely use updating. I'm personally not
familiar with pgsql, so hopefully someone else can chime in.

Another user on this list has made an alternative resource agent that
you might want to check out:

http://lists.clusterlabs.org/pipermail/users/2016-December/004740.html

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Can packmaker launch haproxy from new network namespace automatically?

2016-12-21 Thread Ken Gaillot
On 12/17/2016 07:26 PM, Hao QingFeng wrote:
> Hi Folks,
> 
> I am installing packmaker to manage the cluster of haproxy within
> openstack on ubuntu 16.04.
> 
> I met the problem that haproxy can't start listening for some services
> in vip because the related ports
> 
> were occupied by those native services which listened on 0.0.0.0.
> 
> I opened a bug to openstack team and a buddy told me that I should use
> pacemaker to run haproxy in
> 
> a separate network namespace.  I attached his description here(also in bug):
> 
> <<<
> 
> Fuel runs haproxy via pacemaker (not vis systemd/upstart) and pacemaker
> runs haproxy in a separate network namespace.
> 
> So haproxy does not cause any problems by listedning on 0.0.0.0 since
> it's listening in a separate network namespace.
> 
> You can see it via "ip netns ls" command and then "ip netns exec haproxy
> ip a".
> 
> Did you try to restart haproxy via systemd/upstart? If so then you could
> face this problem. You should use pacemaker to control haproxy service.
> 

> 
> Here is the bug link:
> 
> https://bugs.launchpad.net/openstack-manuals/+bug/1649902
> 
> Actually I did start haproxy with pacemaker but "ip netns ls" show
> nothing and haproxy can't bind some port like 9292 on vip .
> 
> I checked and found that openstack starts including this function from
> fuel 5.0(released in May, 2014).
> 
> But after I downloaded pacemaker's code, did a rough check, I couldn't
> find any related functions(keywords: ip netns, clone, CLONE_NEW...)
> 
> except in the test cases for neutron and ovs etc(if my understanding is
> correct).
> 
> I didn't see any related configuration item in "crm configure show" either.
> 
> 
> So I would like just  to confirm that if pacemaker has such function to
> create a new network namespace
> 
> for haproxy(or other manged service) automatically to avoid such socket
> binding conflict?
> 
> If yes, how to configure it? If no such function, do you have any advice
> on how to solve the problem?

No, pacemaker has no way to do that itself, but maybe you could run
haproxy inside a container, and manage the container as a pacemaker
resource.

> 
> BTW, you can see the detailed configuration information in the bug link,
> if you need more, please let me know.
> 
> Thanks a lot!
> 
> Regards!
> 
> -- 
> 
> QingFeng Hao(Robin)

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] New ClusterLabs logo unveiled :-)

2016-12-22 Thread Ken Gaillot
Hi all,

ClusterLabs is happy to unveil its new logo! Many thanks to the
designer, Kristoffer Grönlund <kgronl...@suse.com>, who graciously
donated the clever approach.

You can see it on our GitHub page:

  https://github.com/ClusterLabs

It is also now the site icon for www.clusterlabs.org and
wiki.clusterlabgs.org. Your browser might have cached the old version,
so you might not see the new one immediately, but you can see it by
going straight to the links and reloading:

  http://clusterlabs.org/favicon.ico
  http://clusterlabs.org/apple-touch-icon.png

It is also on the wiki banner, though the banner will need some tweaking
to make the best use of it. You might not see it there immediately due
to browser caching and DNS resolver caching (the wiki IP changed
recently as part of an OS upgrade), but it's there. :-)

Wishing everyone a happy holiday season,
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Cluster failure

2016-12-20 Thread Ken Gaillot
On 12/20/2016 12:21 AM, Rodrick Brown wrote:
> I'm fairly new to Pacemaker and have a few questions about 
> 
> The following log event and why resources was removed from my cluster 
> Right before the resources being killed SIGTERM I notice the following
> message. 
> Dec 18 19:18:18 clusternode38.mf stonith-ng[10739]:   notice: On loss of
> CCM Quorum: Ignore

The "notice:" part is the severity of the message -- "error:" or
"warning:" is bad, but anything else is purely informational.

In this case, this message indicates that you have
no-quorum-policy=ignore configured. It will be printed every time one of
the pacemaker daemons or tools reads your configuration, and has nothing
to do with your resource problem.

> What exactly does this mean my resources recovered after a few minutes
> and did not fail over any idea what's going on here? 

You'd need to look at a much larger chunk of the logs, typically on more
than one node. The cluster will elect one node to be the "Designated
Controller" (DC), and that node's logs will have essential information
about the decision-making process.

> or documentation I can read that explains what exactly happened? 
> 
> -- 
> 
> 
> *Rodrick Brown */ *Site Reliability Engineer 
> *(917) 445 - 6839 / *rbr...@marketfactory.com
> 
> **425 Broadway #3, New York, NY 10013*

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Running two independent clusters

2017-03-22 Thread Ken Gaillot
On 03/22/2017 05:23 AM, Nikhil Utane wrote:
> Hi Ulrich,
> 
> It's not an option unfortunately.
> Our product runs on a specialized hardware and provides both the
> services (A & B) that I am referring to. Hence I cannot have service A
> running on some nodes as cluster A and service B running on other nodes
> as cluster B.
> The two services HAVE to run on same node. The catch being service A and
> service B have to be independent of each other.
> 
> Hence looking at Container option since we are using that for some other
> product (but not for Pacemaker/Corosync).
> 
> -Regards
> Nikhil

Instead of containerizing pacemaker, why don't you containerize or
virtualize the services, and have pacemaker manage the containers/VMs?

Coincidentally, I am about to announce enhanced container support in
pacemaker. I should have a post with more details later today or tomorrow.

> 
> On Wed, Mar 22, 2017 at 12:41 PM, Ulrich Windl
>  > wrote:
> 
> >>> Nikhil Utane  > schrieb am 22.03.2017 um 07:48 in
> Nachricht
>  >:
> > Hi All,
> >
> > First of all, let me thank everyone here for providing excellent support
> > from the time I started evaluating this tool about a year ago. It has
> > helped me to make a timely and good quality release of our Redundancy
> > solution using Pacemaker & Corosync. (Three cheers :))
> >
> > Now for our next release we have a slightly different ask.
> > We want to provide Redundancy to two different types of services (we can
> > call them Service A and Service B) such that all cluster communication 
> for
> > Service A happens on one network/interface (say VLAN A) and for service 
> B
> > happens on a different network/interface (say VLAN B). Moreover we do 
> not
> > want the details of Service A (resource attributes etc) to be seen by
> > Service B and vice-versa.
> >
> > So essentially we want to be able to run two independent clusters. From
> > what I gathered, we cannot run multiple instances of Pacemaker and 
> Corosync
> > on same node. I was thinking if we can use Containers and run two 
> isolated
> 
> You conclude from two services that should not see each other that
> you need to instances of pacemaker on one node. Why?
> If you want true separation, drop the VLANs, make real networks and
> two independent clusters.
> Even if two pacemeaker on one node would work, you habe the problem
> of fencing, where at least one pacemaker instance will always be
> surprised badly if fencing takes place. I cannot imaging you want that!
> 
> > instances of Pacemaker + Corosync on same node.
> > As per https://github.com/davidvossel/pacemaker_docker
>  it looks do-able.
> > I wanted to get an opinion on this forum before I can commit that it 
> can be
> > done.
> 
> Why are you designing it more complicated as necessary?
> 
> >
> > Please share your views if you have already done this and if there are 
> any
> > known challenges that I should be familiar with.
> >
> > -Thanks
> > Nikhil

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] error: The cib process (17858) exited: Key has expired (127)

2017-03-24 Thread Ken Gaillot
On 03/24/2017 11:06 AM, Rens Houben wrote:
> I activated debug=cib, and retried.
> 
> New log file up at
> http://proteus.systemec.nl/~shadur/pacemaker/pacemaker_2.log.txt
> <http://proteus.systemec.nl/%7Eshadur/pacemaker/pacemaker_2.log.txt> ;
> unfortunately, while that *is* more information I'm not seeing anything
> that looks like it could be the cause, although it shouldn't be reading
> any config files yet because there shouldn't be any *to* read...

If there's no config file, pacemaker will create an empty one and use
that, so it still goes through the mechanics of validating it and
writing it out.

Debug doesn't give us much -- just one additional message before it dies:

Mar 24 16:59:27 [20266] castorcib:debug: activateCibXml:
Triggering CIB write for start op

You might want to look at the system log around that time to see if
something else is going wrong. If you have SELinux enabled, check the
audit log for denials.

> As to the misleading error message, it gets weirder: I grabbed a copy of
> the source code via apt-get source, and the phrase 'key has expired'
> does not occur anywhere in any file according to find ./ -type f -exec
> grep -il 'key has expired' {} \; so I have absolutely NO idea where it's
> coming from...

Right, it's not part of pacemaker, it's just the standard system error
message for errno 127. But the exit status isn't an errno, so that's not
the right interpretation. I can't find any code path in the cib that
would return 127, so I don't know what the right intepretation would be.

> 
> --
> Rens Houben
> Systemec Internet Services
> 
> SYSTEMEC BV
> 
> Marinus Dammeweg 25, 5928 PW Venlo
> Postbus 3290, 5902 RG Venlo
> Industrienummer: 6817
> Nederland
> 
> T: 077-3967572 (Support)
> K.V.K. nummer: 12027782 (Venlo)
> 
> Systemec Datacenter Venlo & Nettetal <https://www.systemec.nl>
> 
> Systemec Helpdesk <https://support.systemec.nl>  Helpdesk
> <https://support.systemec.nl>
> 
> Aanmelden nieuwsbrief <https://www.systemec.nl/nl/nieuwsbrief>
>  Aanmelden nieuwsbrief <https://www.systemec.nl/nl/nieuwsbrief>
> 
> Volg ons op: Systemec Twitter <https://twitter.com/systemec> Systemec
> Facebook <https://www.facebook.com/systemecbv> Systemec Linkedin
> <http://www.linkedin.com/company/systemec-b.v.> Systemec Youtube
> <http://www.youtube.com/user/systemec1>
> 
> 
> 
> Van: Ken Gaillot <kgail...@redhat.com>
> Verzonden: vrijdag 24 maart 2017 16:49
> Aan: users@clusterlabs.org
> Onderwerp: Re: [ClusterLabs] error: The cib process (17858) exited: Key
> has expired (127)
> 
> On 03/24/2017 08:06 AM, Rens Houben wrote:
>> I recently upgraded a two-node cluster (named 'castor' and 'pollux'
>> because I should not be allowed to think up computer names before I've
>> had my morning caffeine) from Debian wheezy to Jessie after the
>> backports for corosync and pacemaker finally made it in. However, one of
>> the two servers failed to start correctly for no really obvious reason.
>>
>> Given as how it'd been years since I last set them up  and had forgotten
>> pretty much everything about it in the interim I decided to purge
>> corosync and pacemaker on both systems and run with clean installs instead.
>>
>> This worked on pollux, but not on castor. Even after going pack,
>> re-purging, removing everything legacy in /var/lib/heartbeat and
>> emptying both directories, castor still refuses to bring up pacemaker.
>>
>>
>> I put the full log of a start attempt up at
>> http://proteus.systemec.nl/~shadur/pacemaker/pacemaker.log.txt
> <http://proteus.systemec.nl/%7Eshadur/pacemaker/pacemaker.log.txt>
>> <http://proteus.systemec.nl/%7Eshadur/pacemaker/pacemaker.log.txt>, but
>> this is the excerpt that I /think/ is causing the failure:
>>
>> Mar 24 13:59:05 [25495] castor pacemakerd:error: pcmk_child_exit:The
>> cib process (25502) exited: Key has expired (127)
>> Mar 24 13:59:05 [25495] castor pacemakerd:   notice:
>> pcmk_process_exit:Respawning failed child process: cib
>>
>> I don't see any entries from cib in the log that suggest anything's
>> going wrong, though, and I'm running out of ideas on where to look next.
> 
> The "Key has expired" message is misleading. (Pacemaker really needs an
> overhaul of the exit codes it can return, so these messages can be
> reliable, but there are always more important things to take care of ...)
> 
> Pacemaker is getting 127 as the exit status of cib, and interpreting
> that as a standard system error number, but it probably isn't one. I
> don't actually see any way that

Re: [ClusterLabs] stonith in dual HMC environment

2017-03-28 Thread Ken Gaillot
On 03/28/2017 08:20 AM, Alexander Markov wrote:
> Hello, Dejan,
> 
>> Why? I don't have a test system right now, but for instance this
>> should work:
>>
>> $ stonith -t ibmhmc ipaddr=10.1.2.9 -lS
>> $ stonith -t ibmhmc ipaddr=10.1.2.9 -T reset {nodename}
> 
> Ah, I see. Everything (including stonith methods, fencing and failover)
> works just fine under normal circumstances. Sorry if I wasn't clear
> about that. The problem occurs only when I have one datacenter (i.e. one
> IBM machine and one HMC) lost due to power outage.

If the datacenters are completely separate, you might want to take a
look at booth. With booth, you set up a separate cluster at each
datacenter, and booth coordinates which one can host resources. Each
datacenter must have its own self-sufficient cluster with its own
fencing, but one site does not need to be able to fence the other.

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm139683855002656

> 
> For example:
> test01:~ # stonith -t ibmhmc ipaddr=10.1.2.8 -lS | wc -l
> info: ibmhmc device OK.
> 39
> test01:~ # stonith -t ibmhmc ipaddr=10.1.2.9 -lS | wc -l
> info: ibmhmc device OK.
> 39
> 
> As I had said stonith device can see and manage all the cluster nodes.
> 
>> If so, then your configuration does not appear to be correct. If
>> both are capable of managing all nodes then you should tell
>> pacemaker about it.
> 
> Thanks for the hint. But if stonith device return node list, isn't it
> obvious for cluster that it can manage those nodes? Could you please be
> more precise about what you refer to? I currently changed configuration
> to two fencing levels (one per HMC) but still don't think I get an idea
> here.

I believe Dejan is referring to fencing topology (levels). That would be
preferable to booth if the datacenters are physically close, and even if
one fence device fails, the other can still function.

In this case you'd probably want level 1 = the main fence device, and
level 2 = the fence device to use if the main device fails.

A common implementation (which Digimer uses to great effect) is to use
IPMI as level 1 and an intelligent power switch as level 2. If your
second device can function regardless of what hosts are up or down, you
can do something similar.

> 
>> Survived node, running stonith resource for dead node tries to
>> contact ipmi device (which is also dead). How does cluster understand
>> that
>> lost node is really dead and it's not just a network issue?
>>
>> It cannot.

And it will be unable to recover resources that were running on the
questionable partition.

> 
> How do people then actually solve the problem of two node metro cluster?
> I mean, I know one option: stonith-enabled=false, but it doesn't seem
> right for me.
> 
> Thank you.
> 
> Regards,
> Alexander Markov

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pending actions

2017-03-24 Thread Ken Gaillot
On 03/07/2017 04:13 PM, Jehan-Guillaume de Rorthais wrote:
> Hi,
> 
> Occasionally, I find my cluster with one pending action not being executed for
> some minutes (I guess until the "PEngine Recheck Timer" elapse).
> 
> Running "crm_simulate -SL" shows the pending actions.
> 
> I'm still confused about how it can happens, why it happens and how to avoid
> this.

It's most likely a bug in the crmd, which schedules PE runs.

> Earlier today, I started my test cluster with 3 nodes and a master/slave
> resource[1], all with positive master score (1001, 1000 and 990), and the
> cluster kept the promote action as a pending action for 15 minutes. 
> 
> You will find in attachment the first 3 pengine inputs executed after the
> cluster startup.
> 
> What are the consequences if I set cluster-recheck-interval to 30s as 
> instance?

The cluster would consume more CPU and I/O continually recalculating the
cluster state.

It would be nice to have some guidelines for cluster-recheck-interval
based on real-world usage, but it's just going by gut feeling at this
point. The cluster automatically recalculates when something
"interesting" happens -- a node comes or goes, a monitor fails, a node
attribute changes, etc. The cluster-recheck-interval is (1) a failsafe
for buggy situations like this, and (2) the maximum granularity of many
time-based checks such as rules. I would personally use at least 5
minutes, though less is probably reasonable, especially with simple
configurations (number of nodes/resources/constraints).

> Thanks in advance for your lights :)
> 
> Regards,
> 
> [1] here is the setup:
> http://dalibo.github.io/PAF/Quick_Start-CentOS-7.html#cluster-resource-creation-and-management

Feel free to open a bug report and include some logs around the time of
the incident (most importantly from the DC).

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Create ressource to monitor each IPSEC VPN

2017-03-24 Thread Ken Gaillot
On 03/09/2017 01:44 AM, Damien Bras wrote:
> Hi,
> 
>  
> 
> We have a 2 nodes cluster with ipsec (libreswan).
> 
> Actually we have a resource to monitor the service ipsec (via system).
> 
>  
> 
> But now I would like to monitor each VPN. Is there a way to do that ?
> Which agent could I use for that ?
> 
>  
> 
> Thanks in advance for your help.
> 
> Damien

I'm not aware of any existing OCF agent for libreswan. You can always
manage any service via its OS launcher (systemd or lsb). If the OS's
status check isn't sufficient, you could additionally use
ocf:pacemaker:ping to monitor an IP address only available across the
VPN, to set a node attribute that you could maybe use somehow.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Running two independent clusters

2017-03-30 Thread Ken Gaillot
On 03/30/2017 01:17 AM, Nikhil Utane wrote:
> "/Coincidentally, I am about to announce enhanced container support in/
> /pacemaker. I should have a post with more details later today or
> tomorrow./"
> 
> Ken: Where you able to get to it?
> 
> -Thanks
> Nikhil

Not yet, we've been tweaking the syntax a bit, so I wanted to have
something more final first. But it's very close.

> 
> On Thu, Mar 23, 2017 at 7:35 PM, Ken Gaillot <kgail...@redhat.com
> <mailto:kgail...@redhat.com>> wrote:
> 
> On 03/22/2017 11:08 PM, Nikhil Utane wrote:
> > I simplified when I called it as a service. Essentially it is a complete
> > system.
> > It is an LTE eNB solution. It provides LTE service (service A) and now
> > we need to provide redundancy for another different but related service
> > (service B). The catch being, the LTE redundancy solution will be tied
> > to one operator whereas the other service can span across multiple
> > operators. Therefore ideally we want two completely independent clusters
> > since different set of nodes will form the two clusters.
> > Now what I am thinking is, to run additional instance of Pacemaker +
> > Corosync in a container which can then notify the service B on host
> > machine to start or stop it's service. That way my CIB file will be
> > independent and I can run corosync on different interfaces.
> >
> > Workable right?
> >
> > -Regards
> > Nikhil
> 
> It's not well-tested, but in theory it should work, as long as the
> container is privileged.
> 
> I still think virtualizing the services would be more resilient. It
> makes sense to have a single determination of quorum and fencing for the
> same real hosts. I'd think of it like a cloud provider -- the cloud
> instances are segregated by customer, but the underlying hosts are
> the same.
> 
> You could configure your cluster as asymmetric, and enable each VM only
> on the nodes it's allowed on, so you get the two separate "clusters"
> that way. You could set up the VMs as guest nodes if you want to monitor
> and manage multiple services within them. If your services require
> hardware access that's not easily passed to a VM, containerizing the
> services might be a better option.
> 
> > On Wed, Mar 22, 2017 at 8:06 PM, Ken Gaillot <kgail...@redhat.com 
> <mailto:kgail...@redhat.com>
> > <mailto:kgail...@redhat.com <mailto:kgail...@redhat.com>>> wrote:
> >
> > On 03/22/2017 05:23 AM, Nikhil Utane wrote:
> > > Hi Ulrich,
> > >
> > > It's not an option unfortunately.
> > > Our product runs on a specialized hardware and provides both the
> > > services (A & B) that I am referring to. Hence I cannot have 
> service A
> > > running on some nodes as cluster A and service B running on other 
> nodes
> > > as cluster B.
> > > The two services HAVE to run on same node. The catch being 
> service A and
> > > service B have to be independent of each other.
> > >
> > > Hence looking at Container option since we are using that for 
> some other
> > > product (but not for Pacemaker/Corosync).
> > >
> > > -Regards
> > > Nikhil
> >
> > Instead of containerizing pacemaker, why don't you containerize or
> > virtualize the services, and have pacemaker manage the 
> containers/VMs?
> >
> > Coincidentally, I am about to announce enhanced container support in
> > pacemaker. I should have a post with more details later today or
> > tomorrow.
> >
> > >
> > > On Wed, Mar 22, 2017 at 12:41 PM, Ulrich Windl
> > > <ulrich.wi...@rz.uni-regensburg.de
> <mailto:ulrich.wi...@rz.uni-regensburg.de>
> > <mailto:ulrich.wi...@rz.uni-regensburg.de
> <mailto:ulrich.wi...@rz.uni-regensburg.de>>
> > > <mailto:ulrich.wi...@rz.uni-regensburg.de
> <mailto:ulrich.wi...@rz.uni-regensburg.de>
> > <mailto:ulrich.wi...@rz.uni-regensburg.de
> <mailto:ulrich.wi...@rz.uni-regensburg.de>>>> wrote:
> > >
> > > >>> Nikhil Utane <nikhil.subscri...@gmail.com 
> <mailto:nikhil.subscri...@gmail.com>
> <mailto:nikhil.subscri...@gmail.com
> <mailto

Re: [ClusterLabs] Antw: Running two independent clusters

2017-03-23 Thread Ken Gaillot
On 03/22/2017 11:08 PM, Nikhil Utane wrote:
> I simplified when I called it as a service. Essentially it is a complete
> system.
> It is an LTE eNB solution. It provides LTE service (service A) and now
> we need to provide redundancy for another different but related service
> (service B). The catch being, the LTE redundancy solution will be tied
> to one operator whereas the other service can span across multiple
> operators. Therefore ideally we want two completely independent clusters
> since different set of nodes will form the two clusters.
> Now what I am thinking is, to run additional instance of Pacemaker +
> Corosync in a container which can then notify the service B on host
> machine to start or stop it's service. That way my CIB file will be
> independent and I can run corosync on different interfaces.
> 
> Workable right?
> 
> -Regards
> Nikhil

It's not well-tested, but in theory it should work, as long as the
container is privileged.

I still think virtualizing the services would be more resilient. It
makes sense to have a single determination of quorum and fencing for the
same real hosts. I'd think of it like a cloud provider -- the cloud
instances are segregated by customer, but the underlying hosts are the same.

You could configure your cluster as asymmetric, and enable each VM only
on the nodes it's allowed on, so you get the two separate "clusters"
that way. You could set up the VMs as guest nodes if you want to monitor
and manage multiple services within them. If your services require
hardware access that's not easily passed to a VM, containerizing the
services might be a better option.

> On Wed, Mar 22, 2017 at 8:06 PM, Ken Gaillot <kgail...@redhat.com
> <mailto:kgail...@redhat.com>> wrote:
> 
> On 03/22/2017 05:23 AM, Nikhil Utane wrote:
> > Hi Ulrich,
> >
> > It's not an option unfortunately.
> > Our product runs on a specialized hardware and provides both the
> > services (A & B) that I am referring to. Hence I cannot have service A
> > running on some nodes as cluster A and service B running on other nodes
> > as cluster B.
> > The two services HAVE to run on same node. The catch being service A and
> > service B have to be independent of each other.
> >
> > Hence looking at Container option since we are using that for some other
> > product (but not for Pacemaker/Corosync).
> >
> > -Regards
> > Nikhil
> 
> Instead of containerizing pacemaker, why don't you containerize or
> virtualize the services, and have pacemaker manage the containers/VMs?
> 
> Coincidentally, I am about to announce enhanced container support in
> pacemaker. I should have a post with more details later today or
> tomorrow.
> 
> >
> > On Wed, Mar 22, 2017 at 12:41 PM, Ulrich Windl
> > <ulrich.wi...@rz.uni-regensburg.de
> <mailto:ulrich.wi...@rz.uni-regensburg.de>
> > <mailto:ulrich.wi...@rz.uni-regensburg.de
> <mailto:ulrich.wi...@rz.uni-regensburg.de>>> wrote:
> >
> > >>> Nikhil Utane <nikhil.subscri...@gmail.com 
> <mailto:nikhil.subscri...@gmail.com>
> > <mailto:nikhil.subscri...@gmail.com
> <mailto:nikhil.subscri...@gmail.com>>> schrieb am 22.03.2017 um 07:48 in
> > Nachricht
> >   
>  <CAGNWmJV05-YG+f9VNG0Deu-2xo7Lp+kRQPOn9sWYy7Jz=0g...@mail.gmail.com
> <mailto:0g...@mail.gmail.com>
> > <mailto:0g...@mail.gmail.com <mailto:0g...@mail.gmail.com>>>:
> > > Hi All,
> > >
> > > First of all, let me thank everyone here for providing
> excellent support
> > > from the time I started evaluating this tool about a year
> ago. It has
> > > helped me to make a timely and good quality release of our
> Redundancy
> > > solution using Pacemaker & Corosync. (Three cheers :))
> > >
> > > Now for our next release we have a slightly different ask.
> > > We want to provide Redundancy to two different types of
> services (we can
> > > call them Service A and Service B) such that all cluster
> communication for
> > > Service A happens on one network/interface (say VLAN A) and
> for service B
> > > happens on a different network/interface (say VLAN B).
> Moreover we do not
> > > want the details of Service A (resource attributes etc) to
> be seen by
> > > Service B and vice-versa.
> > >
> &g

Re: [ClusterLabs] Three node cluster becomes completely fenced if one node leaves

2017-03-27 Thread Ken Gaillot
On 03/27/2017 03:54 PM, Seth Reid wrote:
> 
> 
> 
> On Fri, Mar 24, 2017 at 2:10 PM, Ken Gaillot <kgail...@redhat.com
> <mailto:kgail...@redhat.com>> wrote:
> 
> On 03/24/2017 03:52 PM, Digimer wrote:
> > On 24/03/17 04:44 PM, Seth Reid wrote:
> >> I have a three node Pacemaker/GFS2 cluster on Ubuntu 16.04. Its not in
> >> production yet because I'm having a problem during fencing. When I
> >> disable the network interface of any one machine, the disabled machines
> >> is properly fenced leaving me, briefly, with a two node cluster. A
> >> second node is then fenced off immediately, and the remaining node
> >> appears to try to fence itself off. This leave two nodes with
> >> corosync/pacemaker stopped, and the remaining machine still in the
> >> cluster but showing an offline node and an UNCLEAN node. What can be
> >> causing this behavior?
> >
> > It looks like the fence attempt failed, leaving the cluster hung. When
> > you say all nodes were fenced, did all nodes actually reboot? Or did the
> > two surviving nodes just lock up? If the later, then that is the proper
> > response to a failed fence (DLM stays blocked).
> 
> See comments inline ...
> 
> >
> >> Each machine has a dedicated network interface for the cluster, and
> >> there is a vlan on the switch devoted to just these interfaces.
> >> In the following, I disabled the interface on node id 2 (b014).
> Node 1
> >> (b013) is fenced as well. Node 2 (b015) is still up.
> >>
> >> Logs from b013:
> >> Mar 24 16:35:01 b013 CRON[19133]: (root) CMD (command -v debian-sa1 >
> >> /dev/null && debian-sa1 1 1)
> >> Mar 24 16:35:13 b013 corosync[2134]: notice  [TOTEM ] A processor
> >> failed, forming new configuration.
> >> Mar 24 16:35:13 b013 corosync[2134]:  [TOTEM ] A processor failed,
> >> forming new configuration.
> >> Mar 24 16:35:17 b013 corosync[2134]: notice  [TOTEM ] A new
> membership
> >> (192.168.100.13:576 <http://192.168.100.13:576>
> <http://192.168.100.13:576>) was formed. Members left: 2
> >> Mar 24 16:35:17 b013 corosync[2134]: notice  [TOTEM ] Failed to
> receive
> >> the leave message. failed: 2
> >> Mar 24 16:35:17 b013 corosync[2134]:  [TOTEM ] A new membership
> >> (192.168.100.13:576 <http://192.168.100.13:576>
> <http://192.168.100.13:576>) was formed. Members left: 2
> >> Mar 24 16:35:17 b013 corosync[2134]:  [TOTEM ] Failed to receive the
> >> leave message. failed: 2
> >> Mar 24 16:35:17 b013 attrd[2223]:   notice: crm_update_peer_proc:
> Node
> >> b014-cl[2] - state is now lost (was member)
> >> Mar 24 16:35:17 b013 cib[2220]:   notice: crm_update_peer_proc: Node
> >> b014-cl[2] - state is now lost (was member)
> >> Mar 24 16:35:17 b013 cib[2220]:   notice: Removing b014-cl/2 from the
> >> membership list
> >> Mar 24 16:35:17 b013 cib[2220]:   notice: Purged 1 peers with id=2
> >> and/or uname=b014-cl from the membership cache
> >> Mar 24 16:35:17 b013 pacemakerd[2187]:   notice:
> crm_reap_unseen_nodes:
> >> Node b014-cl[2] - state is now lost (was member)
> >> Mar 24 16:35:17 b013 attrd[2223]:   notice: Removing b014-cl/2
> from the
> >> membership list
> >> Mar 24 16:35:17 b013 attrd[2223]:   notice: Purged 1 peers with id=2
> >> and/or uname=b014-cl from the membership cache
> >> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice:
> crm_update_peer_proc:
> >> Node b014-cl[2] - state is now lost (was member)
> >> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice: Removing
> b014-cl/2 from
> >> the membership list
> >> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice: Purged 1 peers with
> >> id=2 and/or uname=b014-cl from the membership cache
> >> Mar 24 16:35:17 b013 dlm_controld[2727]: 3091 fence request 2 pid
> 19223
> >> nodedown time 1490387717 fence_all dlm_stonith
> >> Mar 24 16:35:17 b013 kernel: [ 3091.800118] dlm: closing
> connection to
> >> node 2
> >> Mar 24 16:35:17 b013 crmd[2227]:   notice: crm_reap_unseen_nodes:
> Node
> >> b014-cl[2] - state is now lost (was member)
> >> Mar 24 16:35:17 b013 dlm_stonith: stonith_api_time: Found 0
> entri

Re: [ClusterLabs] Syncing data and reducing CPU utilization of cib process

2017-03-31 Thread Ken Gaillot
On 03/31/2017 06:44 AM, Nikhil Utane wrote:
> We are seeing this log in pacemaker.log continuously.
> 
> Mar 31 17:13:01 [6372] 0005B932ED72cib: info:
> crm_compress_string:  Compressed 436756 bytes into 14635 (ratio 29:1) in
> 284ms
> 
> This looks to be the reason for high CPU. What does this log indicate?

If a cluster message is larger than 128KB, pacemaker will compress it
(using BZ2) before transmitting it across the network to the other
nodes. This can hit the CPU significantly. Having a large resource
definition makes such messages more common.

There are many ways to sync a configuration file between nodes. If the
configuration rarely changes, a simple rsync cron could do it.
Specialized tools like lsyncd are more responsive while still having a
minimal footprint. DRBD or shared storage would be more powerful and
real-time. If it's a custom app, you could even modify it to use
something like etcd or a NoSQL db.

> 
> -Regards
> Nikhil
> 
> 
> On Fri, Mar 31, 2017 at 12:08 PM, Nikhil Utane
> > wrote:
> 
> Hi,
> 
> In our current design (which we plan to improve upon) we are using
> the CIB file to synchronize information across active and standby nodes.
> Basically we want the standby node to take the configuration that
> was used by the active node so we are adding those as resource
> attributes. This ensures that when the standby node takes over, it
> can read all the configuration which will be passed to it as
> environment variables.
> Initially we thought the list of configuration parameters will be
> less and we did some prototyping and saw that there wasn't much of
> an issue. But now the list has grown it has become close to 300
> attributes. (I know this is like abusing the feature and we are
> looking towards doing it the right way).
> 
> So I have two questions:
> 1) What is the best way to synchronize such kind of information
> across nodes in the cluster? DRBD? Anything else that is simpler?
> For e.g. instead of syncing 300 attributes i could just sync up the
> path to a file.
> 
> 2) In the current design, is there anything that I can do to reduce
> the CPU utilization of cib process? Currently it regularly takes
> 30-50% of the CPU.
> Any quick fix that I can do which will bring it down? For e.g.
> configure how often it synchronizes etc?
> 
> -Thanks
> Nikhil

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] error: The cib process (17858) exited: Key has expired (127)

2017-03-24 Thread Ken Gaillot
On 03/24/2017 08:06 AM, Rens Houben wrote:
> I recently upgraded a two-node cluster (named 'castor' and 'pollux'
> because I should not be allowed to think up computer names before I've
> had my morning caffeine) from Debian wheezy to Jessie after the
> backports for corosync and pacemaker finally made it in. However, one of
> the two servers failed to start correctly for no really obvious reason.
> 
> Given as how it'd been years since I last set them up  and had forgotten
> pretty much everything about it in the interim I decided to purge
> corosync and pacemaker on both systems and run with clean installs instead.
> 
> This worked on pollux, but not on castor. Even after going pack,
> re-purging, removing everything legacy in /var/lib/heartbeat and
> emptying both directories, castor still refuses to bring up pacemaker.
> 
> 
> I put the full log of a start attempt up at
> http://proteus.systemec.nl/~shadur/pacemaker/pacemaker.log.txt
> , but
> this is the excerpt that I /think/ is causing the failure:
> 
> Mar 24 13:59:05 [25495] castor pacemakerd:error: pcmk_child_exit:The
> cib process (25502) exited: Key has expired (127)
> Mar 24 13:59:05 [25495] castor pacemakerd:   notice:
> pcmk_process_exit:Respawning failed child process: cib
> 
> I don't see any entries from cib in the log that suggest anything's
> going wrong, though, and I'm running out of ideas on where to look next.

The "Key has expired" message is misleading. (Pacemaker really needs an
overhaul of the exit codes it can return, so these messages can be
reliable, but there are always more important things to take care of ...)

Pacemaker is getting 127 as the exit status of cib, and interpreting
that as a standard system error number, but it probably isn't one. I
don't actually see any way that the cib can return 127, so I'm not sure
what that might indicate.

In any case, the cib is mysteriously dying whenever it tries to start,
apparently without logging why or dumping core. (Do you have cores
disabled at the OS level?)

> Does anyone have any suggestions as to how to coax more information out
> of the processes and into the log files so I'll have a clue to work with?

Try it again with PCMK_debug=cib in /etc/default/pacemaker. That should
give more log messages.

> 
> Regards,
> 
> --
> Rens Houben
> Systemec Internet Services
> 
> SYSTEMEC BV
> 
> Marinus Dammeweg 25, 5928 PW Venlo
> Postbus 3290, 5902 RG Venlo
> Industrienummer: 6817
> Nederland
> 
> T: 077-3967572 (Support)
> K.V.K. nummer: 12027782 (Venlo)
> 
> Systemec Datacenter Venlo & Nettetal 
> 
> Systemec Helpdesk   Helpdesk
> 
> 
> Aanmelden nieuwsbrief 
>  Aanmelden nieuwsbrief 
> 
> Volg ons op: Systemec Twitter  Systemec
> Facebook  Systemec Linkedin
>  Systemec Youtube
> 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] stonith in dual HMC environment

2017-03-24 Thread Ken Gaillot
On 03/22/2017 09:42 AM, Alexander Markov wrote:
> 
>> Please share your config along with the logs from the nodes that were
>> effected.
> 
> I'm starting to think it's not about how to define stonith resources. If
> the whole box is down with all the logical partitions defined, then HMC
> cannot define if LPAR (partition) is really dead or just inaccessible.
> This leads to UNCLEAN OFFLINE node status and pacemaker refusal to do
> anything until it's resolved. Am I right? Anyway, the simples pacemaker
> config from my partitions is below.

Yes, it looks like you are correct. The fence agent is returning an
error when pacemaker tries to use it to reboot crmapp02. From the stderr
in the logs, the message is "ssh: connect to host 10.1.2.9 port 22: No
route to host".

The first thing I'd try is making sure you can fence each node from the
command line by manually running the fence agent. I'm not sure how to do
that for the "stonith:" type agents.

Once that's working, make sure the cluster can do the same, by manually
running "stonith_admin -B $NODE" for each $NODE.

> 
> primitive sap_ASCS SAPInstance \
> params InstanceName=CAP_ASCS01_crmapp \
> op monitor timeout=60 interval=120 depth=0
> primitive sap_D00 SAPInstance \
> params InstanceName=CAP_D00_crmapp \
> op monitor timeout=60 interval=120 depth=0
> primitive sap_ip IPaddr2 \
> params ip=10.1.12.2 nic=eth0 cidr_netmask=24

> primitive st_ch_hmc stonith:ibmhmc \
> params ipaddr=10.1.2.9 \
> op start interval=0 timeout=300
> primitive st_hq_hmc stonith:ibmhmc \
> params ipaddr=10.1.2.8 \
> op start interval=0 timeout=300

I see you have two stonith devices defined, but they don't specify which
nodes they can fence -- pacemaker will assume that either device can be
used to fence either node.

> group g_sap sap_ip sap_ASCS sap_D00 \
> meta target-role=Started

> location l_ch_hq_hmc st_ch_hmc -inf: crmapp01
> location l_st_hq_hmc st_hq_hmc -inf: crmapp02

These constraints restrict which node monitors which device, not which
node the device can fence.

Assuming st_ch_hmc is intended to fence crmapp01, this will make sure
that crmapp02 monitors that device -- but you also want something like
pcmk_host_list=crmapp01 in the device configuration.

> location prefer_node_1 g_sap 100: crmapp01
> property cib-bootstrap-options: \
> stonith-enabled=true \
> no-quorum-policy=ignore \
> placement-strategy=balanced \
> expected-quorum-votes=2 \
> dc-version=1.1.12-f47ea56 \
> cluster-infrastructure="classic openais (with plugin)" \
> last-lrm-refresh=1490009096 \
> maintenance-mode=false
> rsc_defaults rsc-options: \
> resource-stickiness=200 \
> migration-threshold=3
> op_defaults op-options: \
> timeout=600 \
> record-pending=true
> 
> Logs are pretty much going in circle: stonith cannot reset logical
> partition via HMC, node stays unclean offline, resources are shown to
> stay on node that is down.
> 
> 
> stonith-ng:error: log_operation:Operation 'reboot' [6942] (call
> 6 from crmd.4568) for host 'crmapp02' with device 'st_ch_hmc:0'
> Trying: st_ch_hmc:0
> stonith-ng:  warning: log_operation:st_ch_hmc:0:6942 [ Performing:
> stonith -t ibmhmc -T reset crmapp02 ]
> stonith-ng:  warning: log_operation:st_ch_hmc:0:6942 [ failed:
> crmapp02 3 ]
> stonith-ng: info: internal_stonith_action_execute:  Attempt 2 to
> execute fence_legacy (reboot). remaining timeout is 59
> stonith-ng: info: update_remaining_timeout: Attempted to
> execute agent fence_legacy (reboot) the maximum number of times (2)
> 
> stonith-ng:error: log_operation:Operation 'reboot' [6955] (call
> 6 from crmd.4568) for host 'crmapp02' with device 'st_hq_hmc' re
> Trying: st_hq_hmc
> stonith-ng:  warning: log_operation:st_hq_hmc:6955 [ Performing:
> stonith -t ibmhmc -T reset crmapp02 ]
> stonith-ng:  warning: log_operation:st_hq_hmc:6955 [ failed:
> crmapp02 8 ]
> stonith-ng: info: internal_stonith_action_execute:  Attempt 2 to
> execute fence_legacy (reboot). remaining timeout is 60
> stonith-ng: info: update_remaining_timeout: Attempted to
> execute agent fence_legacy (reboot) the maximum number of times (2)
> 
> stonith-ng:error: log_operation:Operation 'reboot' [6976] (call
> 6 from crmd.4568) for host 'crmapp02' with device 'st_hq_hmc:0'
> 
> stonith-ng:  warning: log_operation:st_hq_hmc:0:6976 [ Performing:
> stonith -t ibmhmc -T reset crmapp02 ]
> stonith-ng:  warning: log_operation:st_hq_hmc:0:6976 [ failed:
> crmapp02 8 ]
> stonith-ng:   notice: stonith_choose_peer:  Couldn't find anyone to
> fence crmapp02 with 
> stonith-ng: info: call_remote_stonith:  None of the 1 peers are
> capable of terminating crmapp02 for crmd.4568 (1)
> stonith-ng:error: remote_op_done:   Operation reboot of crmapp02 by
>  for crmd.4568@crmapp01.6bf66b9c: No route to host
> crmd:   notice: tengine_stonith_callback: Stonith 

Re: [ClusterLabs] Failover question

2017-03-16 Thread Ken Gaillot
On 03/16/2017 04:01 AM, Frank Fiene wrote:
> OK, but with the parameter INFINITY?

Correct, that makes it mandatory, so virtual_ip can only run where there
is a running instance of proxy.

> I am not sure that this prevents Apache for running on the host without the 
> virtual IP.

You configured apache as a clone, so it will run on all nodes,
regardless of where the IP is -- but the IP would only be placed where
apache is successfully running.

>> Am 15.03.2017 um 15:15 schrieb Ken Gaillot <kgail...@redhat.com>:
>>
>> Sure, just add a colocation constraint for virtual_ip with proxy.
>>
>> On 03/15/2017 05:06 AM, Frank Fiene wrote:
>>> Hi,
>>>
>>> Another beginner question:
>>>
>>> I have configured a virtual IP resource on two hosts and an apache resource 
>>> cloned on both machines like this
>>>
>>> pcs resource create virtual_ip ocf:heartbeat:IPaddr2 params ip= 
>>> op monitor interval=10s
>>> pcs resource create proxy lsb:apache2 
>>> statusurl="http://127.0.0.1/server-status; op monitor interval=15s clone
>>>
>>>
>>> Will the IP failover if the Apache server on the Master has a problem?
>>> The Apache is just acting as a proxy, so I thought it would be faster to 
>>> have it already running on both machines.
>>>
>>>
>>> Kind Regards! Frank
>>> — 
>>> Frank Fiene
>>> IT-Security Manager VEKA Group
>>>
>>> Fon: +49 2526 29-6200
>>> Fax: +49 2526 29-16-6200
>>> mailto: ffi...@veka.com
>>> http://www.veka.com
>>>
>>> PGP-ID: 62112A51
>>> PGP-Fingerprint: 7E12 D61B 40F0 212D 5A55 765D 2A3B B29B 6211 2A51
>>> Threema: VZK5NDWW
>>>
>>> VEKA AG
>>> Dieselstr. 8
>>> 48324 Sendenhorst
>>> Deutschland/Germany
>>>
>>> Vorstand/Executive Board: Andreas Hartleif (Vorsitzender/CEO),
>>> Dr. Andreas W. Hillebrand, Bonifatius Eichwald, Elke Hartleif, Dr. Werner 
>>> Schuler,
>>> Vorsitzender des Aufsichtsrates/Chairman of Supervisory Board: Ulrich Weimer
>>> HRB 8282 AG MĂĽnster/District Court of MĂĽnster

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Failover question

2017-03-15 Thread Ken Gaillot
Sure, just add a colocation constraint for virtual_ip with proxy.

On 03/15/2017 05:06 AM, Frank Fiene wrote:
> Hi,
> 
> Another beginner question:
> 
> I have configured a virtual IP resource on two hosts and an apache resource 
> cloned on both machines like this
> 
> pcs resource create virtual_ip ocf:heartbeat:IPaddr2 params ip= 
> op monitor interval=10s
> pcs resource create proxy lsb:apache2 
> statusurl="http://127.0.0.1/server-status; op monitor interval=15s clone
> 
> 
> Will the IP failover if the Apache server on the Master has a problem?
> The Apache is just acting as a proxy, so I thought it would be faster to have 
> it already running on both machines.
> 
> 
> Kind Regards! Frank
> — 
> Frank Fiene
> IT-Security Manager VEKA Group
> 
> Fon: +49 2526 29-6200
> Fax: +49 2526 29-16-6200
> mailto: ffi...@veka.com
> http://www.veka.com
> 
> PGP-ID: 62112A51
> PGP-Fingerprint: 7E12 D61B 40F0 212D 5A55 765D 2A3B B29B 6211 2A51
> Threema: VZK5NDWW
> 
> VEKA AG
> Dieselstr. 8
> 48324 Sendenhorst
> Deutschland/Germany
> 
> Vorstand/Executive Board: Andreas Hartleif (Vorsitzender/CEO),
> Dr. Andreas W. Hillebrand, Bonifatius Eichwald, Elke Hartleif, Dr. Werner 
> Schuler,
> Vorsitzender des Aufsichtsrates/Chairman of Supervisory Board: Ulrich Weimer
> HRB 8282 AG MĂĽnster/District Court of MĂĽnster

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] CIB configuration: role with many expressions - error 203

2017-03-21 Thread Ken Gaillot
On 03/21/2017 11:20 AM, Radoslaw Garbacz wrote:
> Hi,
> 
> I have a problem when creating rules with many expressions:
> 
>  
>  boolean-op="and">
>id="on_nodes_dbx_first_head-expr" value="Active"/>
>id="on_nodes_dbx_first_head-expr" value="AH"/>
> 
>   
> 
> Result:
> Call cib_replace failed (-203): Update does not conform to the
> configured schema
> 
> Everything works when I remove "boolean-op" attribute and leave only one
> expression.
> What do I do wrong when creating rules?

boolean_op

Underbar not dash :-)

> 
> 
> Pacemaker 1.1.16-1.el6
> Written by Andrew Beekhof
> 
> 
> Thank in advance for any help,
> 
> -- 
> Best Regards,
> 
> Radoslaw Garbacz
> XtremeData Incorporated

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker for Embedded Systems

2017-04-11 Thread Ken Gaillot
On 04/10/2017 03:58 PM, Chad Cravens wrote:
> Hello all:
> 
> we have implemented large cluster solutions for complex server
> environments that had databases, application servers, apache web servers
> and implemented fencing with the IPMI fencing agent.
> 
> However, we are considering if pacemaker would be a good solution for
> high availability for an embedded control system that integrates with
> CAN for vehicles? We will also have Ethernet for cluster communication
> between the hardware units.
> 
> My main questions are:
> 1) Is it common use case to use pacemaker to implement high availability
> for embedded control systems?

I know it has been done. I'd love to hear about some specific examples,
but I don't know of any public ones.

> 2) What, if any, special considerations should be taken when it comes to
> fencing in this type of environment?

>From pacemaker's point of view, it's not a special environment ...
communication between nodes and some way to request fencing are all
that's needed.

Of course, the physical environment poses many more challenges in this
case, not to mention the safety and regulatory requirements if the
system is in any way important to the operation of the vehicle.

I don't have any experience in the area, but just as a thought
experiment, I'd think the main question would be: what happens in a
split-brain situation? Fencing is important to the same degree as the
consequences of that. If the worst that happens is the music player
skips tracks, it might be acceptable to disable fencing; if the vehicle
could brake inappropriately, then the needs are much larger.

> Thank you for any guidance!
> 
> -- 
> Kindest Regards,
> Chad Cravens
> (800) 214-9146 x700
> 
> http://www.ossys.com 
> http://www.linkedin.com/company/open-source-systems-llc
>   
> https://www.facebook.com/OpenSrcSys
>    https://twitter.com/OpenSrcSys
>   http://www.youtube.com/OpenSrcSys
>    http://www.ossys.com/feed
>    cont...@ossys.com 
> Chad Cravens
> (800) 214-9146 x700
> chad.crav...@ossys.com 
> http://www.ossys.com


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Coming in Pacemaker 1.1.17: Per-operation fail counts

2017-04-03 Thread Ken Gaillot
Hi all,

Pacemaker 1.1.17 will have a significant change in how it tracks
resource failures, though the change will be mostly invisible to users.

Previously, Pacemaker tracked a single count of failures per resource --
for example, start failures and monitor failures for a given resource
were added together.

In a thread on this list last year[1], we discussed adding some new
failure handling options that would require tracking failures for each
operation type.

Pacemaker 1.1.17 will include this tracking, in preparation for adding
the new options in a future release.

Whereas previously, failure counts were stored in node attributes like
"fail-count-myrsc", they will now be stored in multiple node attributes
like "fail-count-myrsc#start_0" and "fail-count-myrsc#monitor_1"
(the number distinguishes monitors with different intervals).

Actual cluster behavior will be unchanged in this release (and
backward-compatible); the cluster will sum the per-operation fail counts
when checking against options such as migration-threshold.

The part that will be visible to the user in this release is that the
crm_failcount and crm_resource --cleanup tools will now be able to
handle individual per-operation fail counts if desired, though by
default they will still affect the total fail count for the resource.

As an example, if "myrsc" has one start failure and one monitor failure,
"crm_failcount -r myrsc --query" will still show 2, but now you can also
say "crm_failcount -r myrsc --query --operation start" which will show 1.

Additionally, crm_failcount --delete previously only reset the
resource's fail count, but it now behaves identically to crm_resource
--cleanup (resetting the fail count and clearing the failure history).

Special note for pgsql users: Older versions of common pgsql resource
agents relied on a behavior of crm_failcount that is now rejected. While
the impact is limited, users are recommended to make sure they have the
latest version of their pgsql resource agent before upgrading to
pacemaker 1.1.17.

[1] http://lists.clusterlabs.org/pipermail/users/2016-September/004096.html
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Coming in Pacemaker 1.1.17: Per-operation fail counts

2017-04-04 Thread Ken Gaillot
On 04/04/2017 01:18 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 03.04.2017 um 17:00 in 
>>>> Nachricht
> <ae3a7cf4-2ef7-4c4f-ae3f-39f473ed6...@redhat.com>:
>> Hi all,
>>
>> Pacemaker 1.1.17 will have a significant change in how it tracks
>> resource failures, though the change will be mostly invisible to users.
>>
>> Previously, Pacemaker tracked a single count of failures per resource --
>> for example, start failures and monitor failures for a given resource
>> were added together.
> 
> That is "per resource operation", not "per resource" ;-)

I mean that there was only a single number to count failures for a given
resource; before this change, failures were not remembered separately by
operation.

>> In a thread on this list last year[1], we discussed adding some new
>> failure handling options that would require tracking failures for each
>> operation type.
> 
> So the existing set of operations failures was restricted to 
> start/stop/monitor? How about master/slave featuring two monitor operations?

No, both previously and with the new changes, all operation failures are
counted (well, except metadata!). The only change is whether they are
remembered per resource or per operation.

>> Pacemaker 1.1.17 will include this tracking, in preparation for adding
>> the new options in a future release.
>>
>> Whereas previously, failure counts were stored in node attributes like
>> "fail-count-myrsc", they will now be stored in multiple node attributes
>> like "fail-count-myrsc#start_0" and "fail-count-myrsc#monitor_1"
>> (the number distinguishes monitors with different intervals).
> 
> Wouldn't it be thinkable to store is as (transient) resource attribute, 
> either local to a node (LRM) or including the node attribute (CRM)?

Failures are specific to the node the failure occurred on, so it makes
sense to store them as transient node attributes.

So, to be more precise, we previously recorded failures per
node+resource combination, and now we record them per
node+resource+operation+interval combination.

>> Actual cluster behavior will be unchanged in this release (and
>> backward-compatible); the cluster will sum the per-operation fail counts
>> when checking against options such as migration-threshold.
>>
>> The part that will be visible to the user in this release is that the
>> crm_failcount and crm_resource --cleanup tools will now be able to
>> handle individual per-operation fail counts if desired, though by
>> default they will still affect the total fail count for the resource.
> 
> Another thing to think about would be "fail count" vs. "fail rate": Currently 
> there is a fail count, and some reset interval, which allows to build some 
> failure rate from it. Maybe many users just have the requirement that some 
> resource shouldn't fail again and again, but with long uptimes (and then the 
> operatior forgets to reset fail counters), occasional failures (like once in 
> two weeks) shouldn't prevent a resource from running.

Yes, we discussed that a bit in the earlier thread. It would be too much
of an incompatible change and add considerable complexity to start
tracking the failure rate.

Failure clearing hasn't changed -- failures can only be cleared by
manual commands, the failure-timeout option, or a restart of cluster
services on a node.

For the example you mentioned, a high failure-timeout is the best answer
we have. You could set a failure-timeout of 24 hours, and if the
resource went 24 hours without any failures, any older failures would be
forgotten.

>> As an example, if "myrsc" has one start failure and one monitor failure,
>> "crm_failcount -r myrsc --query" will still show 2, but now you can also
>> say "crm_failcount -r myrsc --query --operation start" which will show 1.
> 
> Would accumulated monitor failures ever prevent a resource from starting, or 
> will it force a stop of the resource?

As of this release, failure recovery behavior has not changed. All
operation failures are added together to produce a single fail count per
resource, as was recorded before. The only thing that changed is how
they're recorded.

Failure recovery is controlled by the resource's migration-threshold and
the operation's on-fail. By default, on-fail=restart and
migration-threshold=INFINITY, so a monitor failure would result in
1,000,000 restarts before being banned from the failing node.

> Regards,
> Ulrich
> 
>>
>> Additionally, crm_failcount --delete previously only reset the
>> resource's fail count, but it now behaves identically to crm_resource
>> -

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-04-04 Thread Ken Gaillot
On 03/13/2017 10:43 PM, Chris Walker wrote:
> Thanks for your reply Digimer.
> 
> On Mon, Mar 13, 2017 at 1:35 PM, Digimer  > wrote:
> 
> On 13/03/17 12:07 PM, Chris Walker wrote:
> > Hello,
> >
> > On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync:
> > 2.4.0-4; libqb: 1.0-1),
> > it looks like successful STONITH operations are not communicated from
> > stonith-ng back to theinitiator (in this case, crmd) until the STONITHed
> > node is removed from the cluster when
> > Corosync notices that it's gone (i.e., after the token timeout).
> 
> Others might have more useful info, but my understanding of a lost node
> sequence is this;
> 
> 1. Node stops responding, corosync declares it lost after token timeout
> 2. Corosync reforms the cluster with remaining node(s), checks if it is
> quorate (always true in 2-node)
> 3. Corosync informs Pacemaker of the membership change.
> 4. Pacemaker invokes stonith, waits for the fence agent to return
> "success" (exit code of the agent as per the FenceAgentAPI
> [https://docs.pagure.org/ClusterLabs.fence-agents/FenceAgentAPI.md]
> ).
> If
> the method fails, it moves on to the next method. If all methods fail,
> it goes back to the first method and tries again, looping indefinitely.
> 
> 
> That's roughly my understanding as well for the case when a node
> suddenly leaves the cluster (e.g., poweroff), and this case is working
> as expected for me.  I'm seeing delays when a node is marked for STONITH
> while it's still up (e.g., after a stop operation fails).  In this case,
> what I expect to see is something like:
> 1.  crmd requests that stonith-ng fence the node
> 2.  stonith-ng (might be a different stonith-ng) fences the node and
> sends a message that it has succeeded
> 3.  stonith-ng (the original from step 1) receives this message and
> communicates back to crmd that the node has been fenced
> 
> but what I'm seeing is
> 1.  crmd requests that stonith-ng fence the node
> 2.  stonith-ng fences the node and sends a message saying that it has
> succeeded
> 3.  nobody hears this message
> 4.  Corosync eventually realizes that the fenced node is no longer part
> of the config and broadcasts a config change
> 5.  stonith-ng finishes the STONITH operation that was started earlier
> and communicates back to crmd that the node has been STONITHed

In your attached log, bug1 was DC at the time of the fencing, and bug0
takes over DC after the fencing. This is what I expect is happening
(logs from bug1 would help confirm):

1. crmd on the DC (bug1) runs pengine which sees the stop failure and
schedules fencing (of bug1)

2. stonithd on bug1 sends a query to all nodes asking who can fence bug1

3. Each node replies, and stonithd on bug1 chooses bug0 to execute the
fencing

4. stonithd on bug0 fences bug1. At this point, it would normally report
the result to the DC ... but that happens to be bug1.

5. Once crmd on bug0 takes over DC, it can decide that the fencing
succeeded, but it can't take over DC until it sees that the old DC is
gone, which takes a while because of your long token timeout. So, this
is where the delay is coming in.

I'll have to think about whether we can improve this, but I don't think
it would be easy. There are complications if for example a fencing
topology is used, such that the result being reported in step 4 might
not be the entire result.

> I'm less convinced that the sending of the STONITH notify in step 2 is
> at fault; it also seems possible that a callback is not being run when
> it should be.
> 
> 
> The Pacemaker configuration is not important (I've seen this happen on
> our production clusters and on a small sandbox), but the config is:
> 
> primitive bug0-stonith stonith:fence_ipmilan \
> params pcmk_host_list=bug0 ipaddr=bug0-ipmi action=off
> login=admin passwd=admin \
> meta target-role=Started
> primitive bug1-stonith stonith:fence_ipmilan \
> params pcmk_host_list=bug1 ipaddr=bug1-ipmi action=off
> login=admin passwd=admin \
> meta target-role=Started
> primitive prm-snmp-heartbeat snmptrap_heartbeat \
> params snmphost=bug0 community=public \
> op monitor interval=10 timeout=300 \
> op start timeout=300 interval=0 \
> op stop timeout=300 interval=0
> clone cln-snmp-heartbeat prm-snmp-heartbeat \
> meta interleave=true globally-unique=false ordered=false
> notify=false
> location bug0-stonith-loc bug0-stonith -inf: bug0
> location bug1-stonith-loc bug1-stonith -inf: bug1
> 
> The corosync config might be more interesting:
> 
> totem {
> version: 2
> crypto_cipher: none
> crypto_hash: none
> secauth: off
> rrp_mode: passive
> transport: udpu
> token: 24
> consensus: 1000
> 
> interface {
> ringnumber 0
>  

[ClusterLabs] Coming in Pacemaker 1.1.17: container bundles

2017-03-31 Thread Ken Gaillot
Hi all,

The release process for Pacemaker 1.1.17 will start soon! The most
significant new feature is container bundles, developed by Andrew Beekhof.

Pacemaker's container story has previously been muddled.

For the simplest case, the ocf:heartbeat:docker agent allows you to
launch a docker instance. This works great, but limited in what it can do.

It is possible to run Pacemaker Remote inside a container and use it as
a guest node, but that does not model containers well: a container is
not a generic platform for any cluster resource, but typically provides
a single service.

"Isolated resources" were added in Pacemaker 1.1.13 to better represent
containers as a single service, but that feature was never documented or
widely used, and it does not model some common container scenarios. It
should now be considered deprecated.

Pacemaker 1.1.17 introduces a new type of resource: the "bundle". A
bundle is a single resource specifying the Docker settings, networking
requirements, and storage requirements for any number of containers
generated from the same Docker image.

A preliminary implementation of the feature is now available in the
master branch, for anyone who wants to experiment. The documentation
source in the master branch has been updated, though the online
documentation on clusterlabs.org has not been regenerated yet.

Here's an example of the CIB XML syntax (higher-level tools will likely
provide a more convenient interface):

 

  

  

  

  





  

  

 

With that, Pacemaker would launch 3 instances of the container image,
assign an IP address to each where it could be reached on port 80 from
the host's network, map host directories into the container, and use
Pacemaker Remote to manage the apache resource inside the container.

The feature is currently experimental and will likely get significant
bugfixes throughout the coming release cycle, but the syntax is stable
and likely what will be released.

I intend to add a more detailed walk-through example to the ClusterLabs
wiki.
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] how start resources on the last running node

2017-04-05 Thread Ken Gaillot
On 04/04/2017 10:01 AM, Ján Poctavek wrote:
> Hi,
> 
> I came here to ask for some inspiration about my cluster setup.
> 
> I have 3-node pcs+corosync+pacemaker cluster. When majority of nodes
> exist in the cluster, everything is working fine. But what recovery
> options do I have when I lose 2 of 3 nodes? If I know for sure that the
> missing nodes are turned off, is there any command to force start of the
> resources? The idea is to make the resources available (by
> administrative command) even without majority of nodes and when the
> other nodes become reachable again, they will normally join to the
> cluster without any manual intervention.
> 
> All nodes are set with wait_for_all, stonith-enabled=false and
> no-quorum-policy=stop.
> 
> Thank you.
> 
> Jan

In general, no. The cluster must have quorum to serve resources.

However, corosync is versatile in how it can define quorum. See the
votequorum(5) man page regarding last_man_standing, auto_tie_breaker,
and allow_downscale. Also, the newest version of corosync supports
qdevice, which is a special quorum arbitrator.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Rename option group resource id with pcs

2017-04-11 Thread Ken Gaillot
On 04/11/2017 05:48 AM, Ulrich Windl wrote:
 Dejan Muhamedagic  schrieb am 11.04.2017 um 11:43 in
> Nachricht <20170411094352.GD8414@tuttle.homenet>:
>> Hi,
>>
>> On Tue, Apr 11, 2017 at 10:50:56AM +0200, Tomas Jelinek wrote:
>>> Dne 11.4.2017 v 08:53 SAYED, MAJID ALI SYED AMJAD ALI napsal(a):
 Hello,

 Is there any option in pcs to rename group resource id?

>>>
>>> Hi,
>>>
>>> No, there is not.
>>>
>>> Pacemaker doesn't really cover the concept of renaming a resource.
>>
>> Perhaps you can check how crmsh does resource rename. It's not
>> impossible, but can be rather involved if there are other objects
>> (e.g. constraints) referencing the resource. Also, crmsh will
>> refuse to rename the resource if it's running.
> 
> The real problem in pacemaker (as resources are created now) is that the 
> "IDs" have too much semantic, i.e. most are derived from the resource name 
> (while lacking a name attribute or element), and some required elements are 
> IDs are accessed by ID, and not by name.
> 
> Examples:
> 
>value="1.1
> .12-f47ea56"/>
> 
> A s and s have no name, but only an ID (it seems).
> 
>   
> 
> This is redundant: As the  is part of a resource (by XML structure) it's 
> unneccessary to put the name of the resource into the ID of the operation.
> 
> It all looks like a kind of abuse of XML IMHO.I think the next CIB format 
> should be able to handle IDs that are free of semantics other than to denote 
> (relatively unique) identity. That is: It should be OK to assign IDs like 
> "i1", "i2", "i3", ... and besides from an IDREF the elements should be 
> accessed by structure and/or name.
> 
> (If the ID should be the primary identification feature, flatten all 
> structure and drop all (redundant) names.)
> 
> Regards,
> Ulrich

That's how it's always been :-)

Pacemaker doesn't care what IDs are, only that they are unique (though
of course they must meet the XML requirements for an ID type as far as
allowed characters). The various tools (CLI, crm shell, pcs)
auto-generate IDs so the user doesn't have to care about them, and they
create IDs like the ones you mention above, because they're easy to
generate.

>>
>> Thanks,
>>
>> Dejan
>>
>>> From
>>> pacemaker's point of view one resource gets removed and another one gets
>>> created.
>>>
>>> This has been discussed recently:
>>> http://lists.clusterlabs.org/pipermail/users/2017-April/005387.html 
>>>
>>> Regards,
>>> Tomas
>>>






 */MAJID SAYED/*

 /HPC System Administrator./

 /King Abdullah International Medical Research Centre/

 /Phone:+9661801(Ext:40631)/

 /Email:sayed...@ngha.med.sa/

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Surprising semantics of location constraints with INFINITY score

2017-04-11 Thread Ken Gaillot
On 04/11/2017 08:30 AM, Kristoffer Grönlund wrote:
> Hi all,
> 
> I discovered today that a location constraint with score=INFINITY
> doesn't actually restrict resources to running only on particular
> nodes. From what I can tell, the constraint assigns the score to that
> node, but doesn't change scores assigned to other nodes. So if the node
> in question happens to be offline, the resource will be started on any
> other node.
> 
> Example:
> 
> 
> 
> If node2 is offline, I see the following:
> 
>  dummy(ocf::heartbeat:Dummy): Started node1
> native_color: dummy allocation score on node1: 1
> native_color: dummy allocation score on node2: -INFINITY
> native_color: dummy allocation score on webui: 0
> 
> It makes some kind of sense, but seems surprising - and the
> documentation is a bit unclear on the topic. In particular, the
> statement that a score = INFINITY means "must" is clearly not correct in
> this case. Maybe the documentation should be clarified for location
> constraints?

Yes, that behavior is intended. I'll make a note to clarify in the
documentation.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] nodes ID assignment issue

2017-04-17 Thread Ken Gaillot
On 04/13/2017 10:40 AM, Radoslaw Garbacz wrote:
> Hi,
> 
> I have a question regarding building CIB nodes scope and specifically
> assignment to node IDs.
> It seems like the preexisting scope is not honored and nodes can get
> replaced based on check-in order.
> 
> I pre-create the nodes scope because it is faster, then setting
> parameters for all the nodes later (when the number of nodes is large).
> 
> From the listings below, one can see that node with ID=1 was replaced
> with another node (uname), however not the options. This situation
> causes problems when resource assignment is based on rules involving
> node options.
> 
> Is there a way to prevent this rearrangement of 'uname', if not whether
> there is a way to make the options follow 'uname', or maybe the problem
> is somewhere else - corosync configuration perhaps?
> Is the corosync 'nodeid' enforced to be also CIB node 'id'?

Hi,

Yes, for cluster nodes, pacemaker gets the node id from the messaging
layer (corosync). For remote nodes, id and uname are always identical.

> 
> 
> Thanks in advance,
> 
> 
> Below is CIB committed before nodes check-in:
> 
> 
>   
> 
>   
>   
>   
> 
>   
>   
> 
>name="STATE"/>
>   
>name="Primary"/>
> 
>   
>   
> 
>name="STATE"/>
>   
>name="Primary"/>
> 
>   
>   
> 
>name="STATE"/>
>   
>name="Primary"/>
> 
>   
>   
> 
>   
>   
>   
> 
>   
> 
> 
> 
> 
> And automatic changes after nodes check-in:
> 
> 
>   
> 
>   
>   
>   
> 
>   
>   
> 
>name="STATE"/>
>   
>name="Primary"/>
> 
>   
>   
> 
>name="STATE"/>
>   
>name="Primary"/>
> 
>   
>   
> 
>name="STATE"/>
>   
>name="Primary"/>
> 
>   
>   
> 
>   
>   
>   
> 
>   
> 
> 
> 
> 
> -- 
> Best Regards,
> 
> Radoslaw Garbacz
> XtremeData Incorporated

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] KVM virtualdomain - stopped

2017-04-17 Thread Ken Gaillot
On 04/13/2017 03:01 AM, Jaco van Niekerk wrote:
> 
> Hi
> 
> I am having endless problems with ocf::heartbeat:VirtualDomain when
> failing over to second node. The virtualdomain goes into a stopped state
> 
> virtdom_compact (ocf::heartbeat:VirtualDomain): Stopped
> 
> * virtdom_compact_start_0 on node2.kvm.bitco.co.za 'unknown error' (1):
> call=93, status=complete, exitreason='Failed to start virtual domain
> compact.',
> last-rc-change='Thu Apr 13 09:11:16 2017', queued=0ms, exec=369ms
> 
> I then aren't able to get it started without deleting the resource and
> adding it again:
> 
> pcs resource create virtdom_compact ocf:heartbeat:VirtualDomain
> config=/etc/libvirt/qemu/compact.xml meta allow-migrate=true op monitor
> interval="30"
> 
> Looking at virsh list --all
> virsh list --all
> Id Name State
> 
> 
> It doesn't seam like ocf:heartbeat:VirtualDomain is able to define the
> domain and thus the command can't start the domain:
> 
> virsh start compact
> error: failed to get domain 'compact'
> error: Domain not found: no domain with matching name 'compact'
> 
> Am I missing something in my configuration:
> pcs resource create my_FS ocf:heartbeat:Filesystem params
> device=/dev/sdc1 directory=/images fstype=xfs
> pcs resource create my_VIP ocf:heartbeat:IPaddr2 ip=192.168.99.10
> cidr_netmask=22 op monitor interval=10s
> pcs resource create virtdom_compact ocf:heartbeat:VirtualDomain
> config=/etc/libvirt/qemu/compact.xml meta allow-migrate=true op monitor

^^^ Make sure that config file is available on all nodes. It's a good
idea to try starting the VM outside the cluster (e.g. with virsh) on
each node, before putting it under cluster control.

> interval="30"
> 
> Regards
> 
> 
> * Jaco van Niekerk*
> 
> * Solutions Architect*
> 
>   
>
> 
>  *T:* 087 135  | Ext: 2102
> 
>
> 
>  *E:* j...@bitco.co.za 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How to force remove a cluster node?

2017-04-17 Thread Ken Gaillot
On 04/13/2017 01:11 PM, Scott Greenlese wrote:
> Hi,
> 
> I need to remove some nodes from my existing pacemaker cluster which are
> currently unbootable / unreachable.
> 
> Referenced
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html#s2-noderemove-HAAR
> 
> *4.4.4. Removing Cluster Nodes*
> The following command shuts down the specified node and removes it from
> the cluster configuration file, corosync.conf, on all of the other nodes
> in the cluster. For information on removing all information about the
> cluster from the cluster nodes entirely, thereby destroying the cluster
> permanently, refer to _Section 4.6, “Removing the Cluster
> Configuration”_
> .
> 
> pcs cluster node remove /node/
> 
> I ran the command with the cluster active on 3 of the 5 available
> cluster nodes (with quorum). The command fails with:
> 
> [root@zs90KP VD]# date;*pcs cluster node remove zs93kjpcs1*
> Thu Apr 13 13:40:59 EDT 2017
> *Error: pcsd is not running on zs93kjpcs1*
> 
> 
> The node was not removed:
> 
> [root@zs90KP VD]# pcs status |less
> Cluster name: test_cluster_2
> Last updated: Thu Apr 13 14:08:15 2017 Last change: Wed Apr 12 16:40:26
> 2017 by root via cibadmin on zs93KLpcs1
> Stack: corosync
> Current DC: zs90kppcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -
> partition with quorum
> 45 nodes and 180 resources configured
> 
> Node zs95KLpcs1: UNCLEAN (offline)
> Online: [ zs90kppcs1 zs93KLpcs1 zs95kjpcs1 ]
> *OFFLINE: [ zs93kjpcs1 ]*
> 
> 
> Is there a way to force remove a node that's no longer bootable? If not,
> what's the procedure for removing a rogue cluster node?
> 
> Thank you...
> 
> Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie, N.Y.
> INTERNET: swgre...@us.ibm.com

Yes, the pcs command is just a convenient shorthand for a series of
commands. You want to ensure pacemaker and corosync are stopped on the
node to be removed (in the general case, obviously already done in this
case), remove the node from corosync.conf and restart corosync on all
other nodes, then run "crm_node -R " on any one active node.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Why shouldn't one store resource configuration in the CIB?

2017-04-17 Thread Ken Gaillot
On 04/13/2017 11:11 AM, Ferenc Wágner wrote:
> Hi,
> 
> I encountered several (old) statements on various forums along the lines
> of: "the CIB is not a transactional database and shouldn't be used as
> one" or "resource parameters should only uniquely identify a resource,
> not configure it" and "the CIB was not designed to be a configuration
> database but people still use it that way".  Sorry if I misquote these,
> I go by my memories now, I failed to dig up the links by a quick try.
> 
> Well, I've been feeling guilty in the above offenses for years, but it
> worked out pretty well that way which helped to suppress these warnings
> in the back of my head.  Still, I'm curious: what's the reason for these
> warnings, what are the dangers of "abusing" the CIB this way?
> /var/lib/pacemaker/cib/cib.xml is 336 kB with 6 nodes and 155 resources
> configured.  Old Pacemaker versions required tuning PCMK_ipc_buffer to
> handle this, but even the default is big enough nowadays (128 kB after
> compression, I guess).
> 
> Am I walking on thin ice?  What should I look out for?

That's a good question. Certainly, there is some configuration
information in most resource definitions, so it's more a matter of degree.

The main concerns I can think of are:

1. Size: Increasing the CIB size increases the I/O, CPU and networking
overhead of the cluster (and if it crosses the compression threshold,
significantly). It also marginally increases the time it takes the
policy engine to calculate a new state, which slows recovery.

2. Consistency: Clusters can become partitioned. If changes are made on
one or more partitions during the separation, the changes won't be
reflected on all nodes until the partition heals, at which time the
cluster will reconcile them, potentially losing one side's changes. I
suppose this isn't qualitatively different from using a separate
configuration file, but those tend to be more static, and failure to
modify all copies would be more obvious when doing them individually
rather than issuing a single cluster command.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] starting primitive resources of a group without starting the complete group - unclear behaviour

2017-04-21 Thread Ken Gaillot
On 04/21/2017 04:38 AM, Lentes, Bernd wrote:
> 
> 
> - On Apr 21, 2017, at 1:24 AM, Ken Gaillot kgail...@redhat.com wrote:
> 
>> On 04/20/2017 02:53 PM, Lentes, Bernd wrote:
> 
>>
>> target-role=Stopped prevents a resource from being started.
>>
>> In a group, each member of the group depends on the previously listed
>> members, same as if ordering and colocation constraints had been created
>> between each pair. So, starting a resource in the "middle" of a group
>> will also start everything before it.
> 
> What is the other way round ? Starting the first of the group ? Will the 
> subsequent follow ?

Groups are generally intended to start and stop as a whole, so I would
expect starting any member explicitly to lead the cluster to want to
start the entire group, but I could be wrong, because only prior members
are required to be started first.

>> Everything in the group inherits this target-role=Stopped. However,
>> prim_vnc_ip_mausdb has its own target-role=Started, which overrides that.
>>
>> I'm not sure what target-role was on each resource at each step in your
>> tests, but the behavior should match that.
>>
> 
> I have to admit that i'm struggling with the meaning of "target-role".
> What does it really mean ? The current status of the resource ? The status of 
> the resource the cluster should try
> to achieve ? Both ? Nothing of this ? Could you clarify that to me ?

"try to achieve"

The cluster doesn't have any direct concept of intentionally starting or
stopping a resource, only of a desired cluster state, and it figures out
the actions needed to get there. The higher-level tools provide the
start/stop concept by setting target-role.

It's the same in practice, but the cluster "thinks" in terms of being
told what the desired end result is, not what specific actions to perform.

> 
> Thanks.
> 
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] starting primitive resources of a group without starting the complete group - unclear behaviour

2017-04-21 Thread Ken Gaillot
On 04/21/2017 07:52 AM, Lentes, Bernd wrote:
> 
> 
> - On Apr 21, 2017, at 11:38 AM, Bernd Lentes 
> bernd.len...@helmholtz-muenchen.de wrote:
> 
>> - On Apr 21, 2017, at 1:24 AM, Ken Gaillot kgail...@redhat.com wrote:
>>
>>> On 04/20/2017 02:53 PM, Lentes, Bernd wrote:
>>
>>>
>>> target-role=Stopped prevents a resource from being started.
>>>
>>> In a group, each member of the group depends on the previously listed
>>> members, same as if ordering and colocation constraints had been created
>>> between each pair. So, starting a resource in the "middle" of a group
>>> will also start everything before it.
> 
> Not in each case. 
> 
> 
> I tested a bit:
> target-role of the group: stopped. (This is inherited by the primitives of 
> the group if not declared otherwise.
> If declared for the primitive otherwise this supersedes the target-role of 
> the group.)
> 
> Starting first primitive of the group. Second primitive does not start 
> because target-role is stopped (inherited by the group).
> 
> 
> Next test:
> 
> target-role of the group still "stopped". target-roles of the primitives not 
> decleared otherwise.
> Starting second primitive. First primitive does not start because target-role 
> is stopped, inherited by the group.
> Second primitive does not start because first primitive does not start, 
> although target-role for the second primitive is started.
> Because second primitive needs first one.
> 
> Is my understanding correct ?

Yes

> 
> 
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Colocation of a primitive resource with a clone with limited copies

2017-04-21 Thread Ken Gaillot
On 04/21/2017 07:14 AM, Vladislav Bogdanov wrote:
> 20.04.2017 23:16, Jan Wrona wrote:
>> On 20.4.2017 19:33, Ken Gaillot wrote:
>>> On 04/20/2017 10:52 AM, Jan Wrona wrote:
>>>> Hello,
>>>>
>>>> my problem is closely related to the thread [1], but I didn't find a
>>>> solution there. I have a resource that is set up as a clone C
>>>> restricted
>>>> to two copies (using the clone-max=2 meta attribute||), because the
>>>> resource takes long time to get ready (it starts immediately though),
>>> A resource agent must not return from "start" until a "monitor"
>>> operation would return success.
>>>
>>> Beyond that, the cluster doesn't care what "ready" means, so it's OK if
>>> it's not fully operational by some measure. However, that raises the
>>> question of what you're accomplishing with your monitor.
>> I know all that and my RA respects that. I didn't want to go into
>> details about the service I'm running, but maybe it will help you
>> understand. Its a data collector which receives and processes data from
>> a UDP stream. To understand these data, it needs templates which
>> periodically occur in the stream (every five minutes or so). After
>> "start" the service is up and running, "monitor" operations are
>> successful, but until the templates arrive the service is not "ready". I
>> basically need to somehow simulate this "ready" state.
> 
> If you are able to detect that your application is ready (it already
> received its templates) in your RA's monitor, you may want to use
> transient node attributes to indicate that to the cluster. And tie your
> vip with such an attribute (with location constraint with rules).

That would be a good approach.

I'd combine it with stickiness so the application doesn't immediately
move when a "not ready" node becomes "ready".

I'd also keep the colocation constraint with the application. That helps
if a "ready" node crashes, because nothing is going to change the
attribute in that case, until the application is started there again.
The colocation constraint guarantees that the attribute is current.

> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_using_rules_to_determine_resource_location.html#_location_rules_based_on_other_node_properties
> 
> 
> Look at pacemaker/ping RA for attr management example.
> 
> [...]

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Wtrlt: Antw: Re: Antw: Re: how important would you consider to have two independent fencing device for each node ?

2017-04-20 Thread Ken Gaillot
On 04/20/2017 01:43 AM, Ulrich Windl wrote:
> Should have gone to the list...
> 
> Digimer  schrieb am 19.04.2017 um 17:20 in Nachricht
>> <600637f1-fef8-0a3d-821c-7aecfa398...@alteeve.ca>:
>>> On 19/04/17 02:38 AM, Ulrich Windl wrote:
>>> Digimer  schrieb am 18.04.2017 um 19:08 in
> Nachricht
 <26e49390-b384-b46e-4965-eba5bfe59...@alteeve.ca>:
> On 18/04/17 11:07 AM, Lentes, Bernd wrote:
>> Hi,
>>
>> i'm currently establishing a two node cluster. Each node is a HP
> server
 with 
> an ILO card.
>> I can fence both of them, it's working fine.
>> But what is if the ILO does not work correctly ? Then fencing is not 
> possible.
>
> Correct. If you only have iLO fencing, then the cluster would hang
> (failed fencing is *not* an indication of node death).
>
>> I also have a switched PDU from APC. Each server has two power
> supplies. 
> Currently one is connected to the normal power equipment, the other to
> the 
> UPS.
>> As a sort of redundancy, if the UPS does not work properly.
>
> That's a fine setup.
>
>> When i'd like to use the switched PDU as a fencing device i will loose
> the

> redundancy of two independent power sources, because then i have to
> connect

> both power supplies together to the UPS.
>> I wouldn't like to do that.
>
> Not if you have two switched PDUs. This is what we do in our Anvil!
> systems... One PDU feeds the first PSU in each node and the second PDU
> feeds the second PSUs. Ideally both PDUs are fed by UPSes, but that's
> not as important. One PDU on a UPS and one PDU directly from mains will
> work.
>
>> How important would you consider to have two independent fencing device
> for

> each node ? I'd can't by another PDU, currently we are very poor.
>
> Depends entirely on your tolerance for interruption. *I* answer that
> with "extremely important". However, most clusters out there have only
> IPMI-based fencing, so they would obviously say "not so important".
>
>> Is there another way to create a second fencing device, independent
> from
 the 
> ILO card ?
>>
>> Thanks.
>
> Sure, SBD would work. I've never seen IPMI not have a watchdog timer
> (and iLO is IPMI++), as one example. It's slow, and needs shared
> storage, but a small box somewhere running a small tgtd or iscsid
> should
> do the trick (note that I have never used SBD myself...).

 Slow is relative: If it takes 3 seconds from issuing the reset command
> until
 the node is dead, it's fast enough for most cases. Even a switched PDU
> has 
>>> some
 delays: The command has to be processed, the relay may "stick" a short 
>>> moment,
 the power supply's capacitors have to discharge (if you have two power 
>>> supplys,
 both need to)...  And iLOs don't really like to be powered off.

 Ulrich
>>>
>>> The way I understand SBD, and correct me if I am wrong, recovery won't
>>> begin until sometime after the watchdog timer kicks. If the watchdog
>>> timer is 60 seconds, then your cluster will hang for >60 seconds (plus
>>> fence delays, etc).
>>
>> I think it works differently: One task periodically reads ist mailbox slot 
>> for commands, and once a comment was read, it's executed immediately. Only
> if 
>> the read task does hang for a long time, the watchdog itself triggers a
> reset 
>> (as SBD seems dead). So the delay is actually made from the sum of "write 
>> delay", "read delay", "command excution".

I think you're right when sbd uses shared-storage, but there is a
watchdog-only configuration that I believe digimer was referring to.

With watchdog-only, the cluster will wait for the value of the
stonith-watchdog-timeout property before considering the fencing successful.

>> The manual page (LSES 11 SP4) states: "Set watchdog timeout to N seconds. 
>> This depends mostly on your storage latency; the majority of devices must be
> 
>> successfully read within this time, or else the node will self-fence." and 
>> "If a watchdog is used together with the "sbd" as is strongly recommended, 
>> the watchdog is activated at initial start of the sbd daemon. The watchdog
> is 
>> refreshed every time the majority of SBD devices has been successfully read.
> 
>> Using a watchdog provides additional protection against "sbd" crashing."
>>
>> Final remark: I thing the developers of sbd were under drugs (or never saw a
> 
>> UNIX program before) when designing the options. For example: "-W  Enable or
> 
>> disable use of the system watchdog to protect against the sbd processes 
>> failing and the node being left in an undefined state. Specify this once to
> 
>> enable, twice to disable." (MHO)
>>
>> Regards,
>> Ulrich
>>
>>>
>>> IPMI and PDUs can confirm fence the peer if ~5 seconds (plus fence
> delays).
>>>
>>> -- 
>>> Digimer
>>> Papers and 

Re: [ClusterLabs] Colocation of a primitive resource with a clone with limited copies

2017-04-20 Thread Ken Gaillot
On 04/20/2017 10:52 AM, Jan Wrona wrote:
> Hello,
> 
> my problem is closely related to the thread [1], but I didn't find a
> solution there. I have a resource that is set up as a clone C restricted
> to two copies (using the clone-max=2 meta attribute||), because the
> resource takes long time to get ready (it starts immediately though),

A resource agent must not return from "start" until a "monitor"
operation would return success.

Beyond that, the cluster doesn't care what "ready" means, so it's OK if
it's not fully operational by some measure. However, that raises the
question of what you're accomplishing with your monitor.

> and by having it ready as a clone, I can failover in the time it takes
> to move an IP resource. I have a colocation constraint "resource IP with
> clone C", which will make sure IP runs with a working instance of C:
> 
> Configuration:
>  Clone: dummy-clone
>   Meta Attrs: clone-max=2 interleave=true
>   Resource: dummy (class=ocf provider=heartbeat type=Dummy)
>Operations: start interval=0s timeout=20 (dummy-start-interval-0s)
>stop interval=0s timeout=20 (dummy-stop-interval-0s)
>monitor interval=10 timeout=20 (dummy-monitor-interval-10)
>  Resource: ip (class=ocf provider=heartbeat type=Dummy)
>   Operations: start interval=0s timeout=20 (ip-start-interval-0s)
>   stop interval=0s timeout=20 (ip-stop-interval-0s)
>   monitor interval=10 timeout=20 (ip-monitor-interval-10)
> 
> Colocation Constraints:
>   ip with dummy-clone (score:INFINITY)
> 
> State:
>  Clone Set: dummy-clone [dummy]
>  Started: [ sub1.example.org sub3.example.org ]
>  ip (ocf::heartbeat:Dummy): Started sub1.example.org
> 
> 
> This is fine until the the active node (sub1.example.org) fails. Instead
> of moving the IP to the passive node (sub3.example.org) with ready clone
> instance, Pacemaker will move it to the node where it just started a
> fresh instance of the clone (sub2.example.org in my case):
> 
> New state:
>  Clone Set: dummy-clone [dummy]
>  Started: [ sub2.example.org sub3.example.org ]
>  ip (ocf::heartbeat:Dummy): Started sub2.example.org
> 
> 
> Documentation states that the cluster will choose a copy based on where
> the clone is running and the resource's own location preferences, so I
> don't understand why this is happening. Is there a way to tell Pacemaker
> to move the IP to the node where the resource is already running?
> 
> Thanks!
> Jan Wrona
> 
> [1] http://lists.clusterlabs.org/pipermail/users/2016-November/004540.html

The cluster places ip based on where the clone will be running at that
point in the recovery, rather than where it was running before recovery.

Unfortunately I can't think of a way to do exactly what you want,
hopefully someone else has an idea.

One possibility would be to use on-fail=standby on the clone monitor.
That way, instead of recovering the clone when it fails, all resources
on the node would move elsewhere. You'd then have to manually take the
node out of standby for it to be usable again.

It might be possible to do something more if you convert the clone to a
master/slave resource, and colocate ip with the master role. For
example, you could set the master score based on how long the service
has been running, so the longest-running instance is always master.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-18 Thread Ken Gaillot
On 04/18/2017 02:47 AM, Ulrich Windl wrote:
 Digimer  schrieb am 16.04.2017 um 20:17 in Nachricht
> <12cde13f-8bad-a2f1-6834-960ff3afc...@alteeve.ca>:
>> On 16/04/17 01:53 PM, Eric Robinson wrote:
>>> I was reading in "Clusters from Scratch" where Beekhof states, "Some would
> 
>> argue that two-node clusters are always pointless, but that is an argument 
>> for another time." Is there a page or thread where this argument has been 
>> fleshed out? Most of my dozen clusters are 2 nodes. I hate to think they're
> 
>> pointless.  
>>>
>>> --
>>> Eric Robinson
>>
>> There is a belief that you can't build a reliable cluster without
>> quorum. I am of the mind that you *can* build a very reliable 2-node
>> cluster. In fact, every cluster our company has deployed, going back
>> over five years, has been 2-node and have had exception uptimes.
>>
>> The confusion comes from the belief that quorum is required and stonith
>> is option. The reality is the opposite. I'll come back to this in a minute.
>>
>> In a two-node cluster, you have two concerns;
>>
>> 1. If communication between the nodes fail, but both nodes are alive,
>> how do you avoid a split brain?
> 
> By killing one of the two parties.
> 
>>
>> 2. If you have a two node cluster and enable cluster startup on boot,
>> how do you avoid a fence loop?
> 
> I think the problem in the question is using "you" instead of "it" ;-)
> Pacemaker assumes all problems that cause STONITH will be solved by STONITH.
> That's not always true (e.g. configuration errors). Maybe a node's failcount
> should not be reset if the node was fenced.
> So you'll avoid a fencing loop, but might end in a state where no resources
> are running. IMHO I'd prefer that over a fencing loop.
> 
>>
>> Many answer #1 by saying "you need a quorum node to break the tie". In
>> some cases, this works, but only when all nodes are behaving in a
>> predictable manner.
> 
> All software relies on the fact that it behaves in a predictable manner, BTW.
> The problem is not "the predictable manner for all nodes", but the predictable
> manner for the cluster.
> 
>>
>> Many answer #2 by saying "well, with three nodes, if a node boots and
>> can't talk to either other node, it is inquorate and won't do anything".
> 
> "wan't do anything" is also wrong: I must go offline without killing others,
> preferrably.
> 
>> This is a valid mechanism, but it is not the only one.
>>
>> So let me answer these from a 2-node perspective;
>>
>> 1. You use stonith and the faster node lives, the slower node dies. From
> 
> Isn't there a possibility that both nodes shoot each other? Is there a
> guarantee that there will always be one faster node?
> 
>> the moment of comms failure, the cluster blocks (needed with quorum,
>> too) and doesn't restore operation until the (slower) peer is in a known
>> state; Off. You can bias this by setting a fence delay against your
>> preferred node. So say node 1 is the node that normally hosts your
>> services, then you add 'delay="15"' to node 1's fence method. This tells
>> node 2 to wait 15 seconds before fencing node 1. If both nodes are
>> alive, node 2 will be fenced before the timer expires.
> 
> Can only the DC issue fencing?

No, any cluster node can initiate fencing. Fencing can also be requested
from a remote node (e.g. via stonith_admin), but the remote node will
ask a cluster node to initiate the fencing.

Also, fence device resources do not need to be "running" in order to be
used. If they are intentionally disabled (target-role=Stopped), they
will not be used, but if they are simply not running, the cluster will
still use the device when needed. "Running" is used solely to determine
whether recurring monitor actions are done.

This design ensures that fencing requires a bare minimum to be
functional (stonith daemon running, and fence devices configured), so it
can be used even at startup before resources are running, and even if
the DC is the node that needs to be fenced or a DC has not yet been elected.

>> 2. In Corosync v2+, there is a 'wait_for_all' option that tells a node
>> to not do anything until it is able to talk to the peer node. So in the
>> case of a fence after a comms break, the node that reboots will come up,
>> fail to reach the survivor node and do nothing more. Perfect.
> 
> Does "do nothing more" mean continuously polling for other nodes?
> 
>>
>> Now let me come back to quorum vs. stonith;
>>
>> Said simply; Quorum is a tool for when everything is working. Fencing is
>> a tool for when things go wrong.
> 
> I'd say: Quorum is the tool to decide who'll be alive and who's going to die,
> and STONITH is the tool to make nodes die. If everything is working you need
> neither quorum nor STONITH.
> 
>>
>> Lets assume that your cluster is working find, then for whatever reason,
>> node 1 hangs hard. At the time of the freeze, it was hosting a virtual
>> IP and an NFS service. Node 2 declares node 1 lost after a period of
>> time and 

Re: [ClusterLabs] Antw: Re: Never join a list without a problem...

2017-03-08 Thread Ken Gaillot
ent: Friday, March 03, 2017 4:51 AM
> To: users@clusterlabs.org
> Subject: Users Digest, Vol 26, Issue 9
> 
> Send Users mailing list submissions to
> users@clusterlabs.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.clusterlabs.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@clusterlabs.org
> 
> You can reach the person managing the list at
> users-ow...@clusterlabs.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Users digest..."
> 
> 
> Today's Topics:
> 
>1. Re: Never join a list without a problem... (Jeffrey Westgate)
>2. Re: PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to
>   PCMK_OCF_UNKNOWN_ERROR (Ken Gaillot)
>3. Re: Cannot clone clvmd resource (Eric Ren)
>4. Re: Cannot clone clvmd resource (Eric Ren)
>5. Antw: Re:  Never join a list without a problem... (Ulrich Windl)
>6. Antw: Re:  Cannot clone clvmd resource (Ulrich Windl)
>7. Re: Insert delay between the statup of VirtualDomain
>   (Dejan Muhamedagic)
> 
> 
> --
> 
> Message: 1
> Date: Thu, 2 Mar 2017 16:32:02 +
> From: Jeffrey Westgate <jeffrey.westg...@arkansas.gov>
> To: Adam Spiers <aspi...@suse.com>, "Cluster Labs - All topics related
> to  open-source clustering welcomed" <users@clusterlabs.org>
> Subject: Re: [ClusterLabs] Never join a list without a problem...
> Message-ID:
> 
> <a36b14fa9aa67f4e836c0ee59dea89c4015b212...@cm-sas-mbx-07.sas.arkgov.net>
> 
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Since we have both pieces of the load-balanced cluster doing the same thing - 
> for still-as-yet unidentified reasons - we've put atop on one and sysdig on 
> the other.  Running atop at 10 second slices, hoping it will catch something. 
>  While configuring it yesterday, that server went into it's 'episode', but 
> there was nothing in the atop log to show anything.  Nothing else changed 
> except the cpu load average.  No increase in any other parameter.
> 
> frustrating.
> 
> 
> 
> From: Adam Spiers [aspi...@suse.com]
> Sent: Wednesday, March 01, 2017 5:33 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> Cc: Jeffrey Westgate
> Subject: Re: [ClusterLabs] Never join a list without a problem...
> 
> Ferenc W?gner <wf...@niif.hu> wrote:
>> Jeffrey Westgate <jeffrey.westg...@arkansas.gov> writes:
>>
>>> We use Nagios to monitor, and once every 20 to 40 hours - sometimes
>>> longer, and we cannot set a clock by it - while the machine is 95%
>>> idle (or more according to 'top'), the host load shoots up to 50 or
>>> 60%.  It takes about 20 minutes to peak, and another 30 to 45 minutes
>>> to come back down to baseline, which is mostly 0.00.  (attached
>>> hostload.pdf) This happens to both machines, randomly, and is
>>> concerning, as we'd like to find what's causing it and resolve it.
>>
>> Try running atop (http://www.atoptool.nl/).  It collects and logs
>> process accounting info, allowing you to step back in time and check
>> resource usage in the past.
> 
> Nice, I didn't know atop could also log the collected data for future
> analysis.
> 
> If you want to capture even more detail, sysdig is superb:
> 
> http://www.sysdig.org/
> 
> 
> 
> --
> 
> Message: 2
> Date: Thu, 2 Mar 2017 17:31:33 -0600
> From: Ken Gaillot <kgail...@redhat.com>
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are
> mapped to PCMK_OCF_UNKNOWN_ERROR
> Message-ID: <8b8dd955-8e35-6824-a80c-2556d833f...@redhat.com>
> Content-Type: text/plain; charset=windows-1252
> 
> On 03/01/2017 05:28 PM, Andrew Beekhof wrote:
>> On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
>> <lars.ellenb...@linbit.com> wrote:
>>> When I recently tried to make use of the DEGRADED monitoring results,
>>> I found out that it does still not work.
>>>
>>> Because LRMD choses to filter them in ocf2uniform_rc(),
>>> and maps them to PCMK_OCF_UNKNOWN_ERROR.
>>>
>>> See patch suggestion below.
>>>
>>> It also filters away the other "special" rc values.
>>> Do we really not want to see them in crmd/pengine?
>>
>> I would think we do.
>>
>>> Why does LRMD think it needs to

Re: [ClusterLabs] resource was disabled automatically

2017-03-07 Thread Ken Gaillot
On 03/06/2017 08:29 PM, cys wrote:
> At 2017-03-07 05:47:19, "Ken Gaillot" <kgail...@redhat.com> wrote:
>> To figure out why a resource was stopped, you want to check the logs on
>> the DC (which will be the node with the most "pengine:" messages around
>> that time). When the PE decides a resource needs to be stopped, you'll
>> see a message like
>>
>>   notice: LogActions:  Stop()
>>
>> Often, by looking at the messages before that, you can see what led it
>> to decide that. Shortly after that, you'll see something like
>>
> 
> Thanks Ken. It's really helpful.
> Finally I found the debug log of pengine(in a separate file). It has this 
> message:
> "All nodes for resource p_vs-scheduler are unavailable, unclean or shutting 
> down..."
> So it seems this caused vs-scheduler disabled.
> 
> If all nodes come back to be in good state, will pengine start the resource 
> automatically?
> I did it manually yesterday.

Yes, whenever a node changes state (such as becoming available), the
pengine will recheck what can be done.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-06 Thread Ken Gaillot
On 03/06/2017 10:55 AM, Lars Ellenberg wrote:
> On Thu, Mar 02, 2017 at 05:31:33PM -0600, Ken Gaillot wrote:
>> On 03/01/2017 05:28 PM, Andrew Beekhof wrote:
>>> On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
>>> <lars.ellenb...@linbit.com> wrote:
>>>> When I recently tried to make use of the DEGRADED monitoring results,
>>>> I found out that it does still not work.
>>>>
>>>> Because LRMD choses to filter them in ocf2uniform_rc(),
>>>> and maps them to PCMK_OCF_UNKNOWN_ERROR.
>>>>
>>>> See patch suggestion below.
>>>>
>>>> It also filters away the other "special" rc values.
>>>> Do we really not want to see them in crmd/pengine?
>>>
>>> I would think we do.
> 
>>>> Note: I did build it, but did not use this yet,
>>>> so I have no idea if the rest of the implementation of the DEGRADED
>>>> stuff works as intended or if there are other things missing as well.
>>>
>>> failcount might be the other place that needs some massaging.
>>> specifically, not incrementing it when a degraded rc comes through
>>
>> I think that's already taken care of.
>>
>>>> Thougts?\
>>>
>>> looks good to me
>>>
>>>>
>>>> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
>>>> index 724edb7..39a7dd1 100644
>>>> --- a/lrmd/lrmd.c
>>>> +++ b/lrmd/lrmd.c
>>>> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char 
>>>> *stdout_data)
>>>>  static int
>>>>  ocf2uniform_rc(int rc)
>>>>  {
>>>> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
>>>> -return PCMK_OCF_UNKNOWN_ERROR;
>>
>> Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to
>> be the high end.
>>
>> Lars, do you want to test that?
> 
> Why would we want to filter at all, then?
> 
> I get it that we may want to map non-ocf agent exit codes
> into the "ocf" range,
> but why mask exit codes from "ocf" agents at all (in lrmd)?
> 
> Lars

It's probably unnecessarily paranoid, but I guess the idea is to check
that the agent at least returns something in the expected range for OCF
(perhaps it's not complying with the spec, or complying with a newer
version of the spec than we can handle).

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource was disabled automatically

2017-03-06 Thread Ken Gaillot
On 03/06/2017 03:49 AM, cys wrote:
> Hi,
> 
> Today I found one resource was disabled. I checked that nobody did it.
> The logs showed crmd(or pengine?) stopped it. I don't known why.
> So I want to know will pacemaker disable resource automatically?
> If so, when and why?
> 
> Thanks.


Pacemaker will never set the target-role automatically, so if you mean
that something set target-role=Stopped, that happened outside the cluster.

If you just mean stopping, the cluster can stop a resource in response
to the configuration or conditions.

The pengine decides what needs to be done, the crmd coordinates it, and
the lrmd does it (for actions on resources, anyway). So all are involved
to some extent.

To figure out why a resource was stopped, you want to check the logs on
the DC (which will be the node with the most "pengine:" messages around
that time). When the PE decides a resource needs to be stopped, you'll
see a message like

   notice: LogActions:  Stop()

Often, by looking at the messages before that, you can see what led it
to decide that. Shortly after that, you'll see something like

   Calculated transition , saving inputs in 

That file will contain the state of the cluster at that moment. So you
can grab that for some deep diving. One of the things you can do with
that file is run crm_simulate on it, to get detailed info about why each
action was taken. "crm_simulate -Ssx " will show a somewhat
painful description of everything the cluster would do and the scores
that fed into the decision.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Ordering Sets of Resources

2017-03-01 Thread Ken Gaillot
On 03/01/2017 01:36 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 26.02.2017 um 20:04 in 
>>>> Nachricht
> <dbf562ff-a830-fc3c-84dc-487b892fc...@redhat.com>:
>> On 02/25/2017 03:35 PM, iva...@libero.it wrote:
>>> Hi all,
>>> i have configured a two node cluster on redhat 7.
>>>
>>> Because I need to manage resources stopping and starting singularly when
>>> they are running I have configured cluster using order set constraints.
>>>
>>> Here the example
>>>
>>> Ordering Constraints:
>>>   Resource Sets:
>>> set MYIP_1 MYIP_2 MYFTP MYIP_5 action=start sequential=false
>>> require-all=true set MYIP_3 MYIP_4 MYSMTP action=start sequential=true
>>> require-all=true setoptions symmetrical=false
>>> set MYSMTP MYIP_4 MYIP_3 action=stop sequential=true
>>> require-all=true set MYIP_5 MYFTP MYIP_2 MYIP_1 action=stop
>>> sequential=true require-all=true setoptions symmetrical=false kind=Mandatory
>>>
>>> The constrait work as expected on start but when stopping the resource
>>> don't respect the order.
>>> Any help is appreciated
>>>
>>> Thank and regards
>>> Ivan
>>
>> symmetrical=false means the order only applies for starting
> 
> From the name (symmetrical) alone it could also mean that it only applies for 
> stopping ;-)
> (Another example where better names would be nice)

Well, more specifically, it only applies to the action specified in the
constraint. I hadn't noticed before that the second constraint here has
action=stop, so yes, that one would only apply for stopping.

In the above example, the two constraints are identical to a single
constraint with symmetrical=true, since the second constraint is just
the reverse of the first.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-02 Thread Ken Gaillot
On 03/01/2017 05:28 PM, Andrew Beekhof wrote:
> On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
>  wrote:
>> When I recently tried to make use of the DEGRADED monitoring results,
>> I found out that it does still not work.
>>
>> Because LRMD choses to filter them in ocf2uniform_rc(),
>> and maps them to PCMK_OCF_UNKNOWN_ERROR.
>>
>> See patch suggestion below.
>>
>> It also filters away the other "special" rc values.
>> Do we really not want to see them in crmd/pengine?
> 
> I would think we do.
> 
>> Why does LRMD think it needs to outsmart the pengine?
> 
> Because the person that implemented the feature incorrectly assumed
> the rc would be passed back unmolested.
> 
>>
>> Note: I did build it, but did not use this yet,
>> so I have no idea if the rest of the implementation of the DEGRADED
>> stuff works as intended or if there are other things missing as well.
> 
> failcount might be the other place that needs some massaging.
> specifically, not incrementing it when a degraded rc comes through

I think that's already taken care of.

>> Thougts?\
> 
> looks good to me
> 
>>
>> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
>> index 724edb7..39a7dd1 100644
>> --- a/lrmd/lrmd.c
>> +++ b/lrmd/lrmd.c
>> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char 
>> *stdout_data)
>>  static int
>>  ocf2uniform_rc(int rc)
>>  {
>> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
>> -return PCMK_OCF_UNKNOWN_ERROR;

Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to
be the high end.

Lars, do you want to test that?

>> +switch (rc) {
>> +default:
>> +   return PCMK_OCF_UNKNOWN_ERROR;
>> +
>> +case PCMK_OCF_OK:
>> +case PCMK_OCF_UNKNOWN_ERROR:
>> +case PCMK_OCF_INVALID_PARAM:
>> +case PCMK_OCF_UNIMPLEMENT_FEATURE:
>> +case PCMK_OCF_INSUFFICIENT_PRIV:
>> +case PCMK_OCF_NOT_INSTALLED:
>> +case PCMK_OCF_NOT_CONFIGURED:
>> +case PCMK_OCF_NOT_RUNNING:
>> +case PCMK_OCF_RUNNING_MASTER:
>> +case PCMK_OCF_FAILED_MASTER:
>> +
>> +case PCMK_OCF_DEGRADED:
>> +case PCMK_OCF_DEGRADED_MASTER:
>> +   return rc;
>> +
>> +#if 0
>> +   /* What about these?? */
> 
> yes, these should get passed back as-is too
> 
>> +/* 150-199 reserved for application use */
>> +PCMK_OCF_CONNECTION_DIED = 189, /* Operation failure implied by 
>> disconnection of the LRM API to a local or remote node */
>> +
>> +PCMK_OCF_EXEC_ERROR= 192, /* Generic problem invoking the agent */
>> +PCMK_OCF_UNKNOWN   = 193, /* State of the service is unknown - used 
>> for recording in-flight operations */
>> +PCMK_OCF_SIGNAL= 194,
>> +PCMK_OCF_NOT_SUPPORTED = 195,
>> +PCMK_OCF_PENDING   = 196,
>> +PCMK_OCF_CANCELLED = 197,
>> +PCMK_OCF_TIMEOUT   = 198,
>> +PCMK_OCF_OTHER_ERROR   = 199, /* Keep the same codes as PCMK_LSB */
>> +#endif
>>  }
>> -
>> -return rc;
>>  }
>>
>>  static int

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Cannot clone clvmd resource

2017-03-01 Thread Ken Gaillot
On 03/01/2017 03:49 PM, Anne Nicolas wrote:
> Hi there
> 
> 
> I'm testing quite an easy configuration to work on clvm. I'm just
> getting crazy as it seems clmd cannot be cloned on other nodes.
> 
> clvmd start well on node1 but fails on both node2 and node3.

Your config looks fine, so I'm going to guess there's some local
difference on the nodes.

> In pacemaker journalctl I get the following message
> Mar 01 16:34:36 node3 pidofproc[27391]: pidofproc: cannot stat /clvmd:
> No such file or directory
> Mar 01 16:34:36 node3 pidofproc[27392]: pidofproc: cannot stat
> /cmirrord: No such file or directory

I have no idea where the above is coming from. pidofproc is an LSB
function, but (given journalctl) I'm assuming you're using systemd. I
don't think anything in pacemaker or resource-agents uses pidofproc (at
least not currently, not sure about the older version you're using).

> Mar 01 16:34:36 node3 lrmd[2174]: notice: finished - rsc:p-clvmd
> action:stop call_id:233 pid:27384 exit-code:0 exec-time:45ms queue-time:0ms
> Mar 01 16:34:36 node3 crmd[2177]: notice: Operation p-clvmd_stop_0: ok
> (node=node3, call=233, rc=0, cib-update=541, confirmed=true)
> Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 72: stop
> p-dlm_stop_0 on node3 (local)
> Mar 01 16:34:36 node3 lrmd[2174]: notice: executing - rsc:p-dlm
> action:stop call_id:235
> Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 67: stop
> p-dlm_stop_0 on node2
> 
> Here is my configuration
> 
> node 739312139: node1
> node 739312140: node2
> node 739312141: node3
> primitive admin_addr IPaddr2 \
> params ip=172.17.2.10 \
> op monitor interval=10 timeout=20 \
> meta target-role=Started
> primitive p-clvmd ocf:lvm2:clvmd \
> op start timeout=90 interval=0 \
> op stop timeout=100 interval=0 \
> op monitor interval=30 timeout=90
> primitive p-dlm ocf:pacemaker:controld \
> op start timeout=90 interval=0 \
> op stop timeout=100 interval=0 \
> op monitor interval=60 timeout=90
> primitive stonith-sbd stonith:external/sbd
> group g-clvm p-dlm p-clvmd
> clone c-clvm g-clvm meta interleave=true
> property cib-bootstrap-options: \
> have-watchdog=true \
> dc-version=1.1.13-14.7-6f22ad7 \
> cluster-infrastructure=corosync \
> cluster-name=hacluster \
> stonith-enabled=true \
> placement-strategy=balanced \
> no-quorum-policy=freeze \
> last-lrm-refresh=1488404073
> rsc_defaults rsc-options: \
> resource-stickiness=1 \
> migration-threshold=10
> op_defaults op-options: \
> timeout=600 \
> record-pending=true
> 
> Thanks in advance for your input
> 
> Cheers
> 


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] cluster does not detect kill on pacemaker process ?

2017-04-07 Thread Ken Gaillot
On 04/07/2017 05:20 PM, neeraj ch wrote:
> I am running it on centos 6.6. I am killing the "pacemakerd" process
> using kill -9.

pacemakerd is a supervisor process that watches the other processes, and
respawns them if they die. It is not really responsible for anything in
the cluster directly. So, killing it does not disrupt the cluster in any
way, it just prevents automatic recovery if one of the other daemons dies.

When systemd is in use, systemd will restart pacemakerd if it dies, but
CentOS 6 does not have systemd (CentOS 7 does).

> hmm, stonith is used for detection as well? I thought it was used to
> disable malfunctioning nodes. 

If you kill pacemakerd, that doesn't cause any harm to the cluster, so
that would not involve stonith.

If you kill crmd or corosync instead, that would cause the node to leave
the cluster -- it would be considered a malfunctioning node. The rest of
the cluster would then use stonith to disable that node, so it could
safely recover its services elsewhere.

> On Fri, Apr 7, 2017 at 7:58 AM, Ken Gaillot <kgail...@redhat.com
> <mailto:kgail...@redhat.com>> wrote:
> 
> On 04/05/2017 05:16 PM, neeraj ch wrote:
> > Hello All,
> >
> > I noticed something on our pacemaker test cluster. The cluster is
> > configured to manage an underlying database using master slave
> primitive.
> >
> > I ran a kill on the pacemaker process, all the other nodes kept
> showing
> > the node online. I went on to kill the underlying database on the same
> > node which would have been detected had the pacemaker on the node been
> > online. The cluster did not detect that the database on the node has
> > failed, the failover never occurred.
> >
> > I went on to kill corosync on the same node and the cluster now marked
> > the node as stopped and proceeded to elect a new master.
> >
> >
> > In a separate test. I killed the pacemaker process on the cluster DC,
> > the cluster showed no change. I went on to change CIB on a different
> > node. The CIB modify command timed out. Once that occurred, the node
> > didn't failover even when I turned off corosync on cluster DC. The
> > cluster didn't recover after this mishap.
> >
> > Is this expected behavior? Is there a solution for when OOM decides to
> > kill the pacemaker process?
> >
> > I run pacemaker 1.1.14, with corosync 1.4. I have stonith disabled and
> > quorum enabled.
> >
> > Thank you,
> >
> > nwarriorch
> 
> What exactly are you doing to kill pacemaker? There are multiple
> pacemaker processes, and they have different recovery methods.
> 
> Also, what OS/version are you running? If it has systemd, that can play
> a role in recovery as well.
> 
> Having stonith disabled is a big part of what you're seeing. When a node
> fails, stonith is the only way the rest of the cluster can be sure the
> node is unable to cause trouble, so it can recover services elsewhere.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] cloned resources ordering and remote nodes problem

2017-04-06 Thread Ken Gaillot
On 04/06/2017 09:32 AM, Radoslaw Garbacz wrote:
> Hi,
> 
> 
> I have a question regarding resources order settings.
> 
> Having cloned resources: "res_1-clone", "res_2-clone",
>  and defined order:  first "res_1-clone" then "res_2-clone"
> 
> When I have a monitoring failure on a remote node with "res_1" (an
> instance of "res_1-clone") which causes all dependent resources to be
> restarted, only instances on this remote node are being restarted, not
> the ones on other nodes.
> 
> Is it an intentional behavior and if so, is there a way to make all
> instances of the cloned resource to be restarted in such a case?

That is controlled by a clone's "interleave" meta-attribute. The default
(false) actually gives your desired behavior. I'm guessing you have
interleave=true configured.

> I can provide more details regarding the CIB configuration when needed.
> 
> Pacemaker 1.1.16-1.el6
> OS: Linux CentOS 6
> 
> 
> Thanks in advance,
> 
> -- 
> Best Regards,
> 
> Radoslaw Garbacz
> XtremeData Incorporated


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Can't See Why This Cluster Failed Over

2017-04-07 Thread Ken Gaillot
On 04/07/2017 12:58 PM, Eric Robinson wrote:
> Somebody want to look at this log and tell me why the cluster failed over? 
> All we did was add a new resource. We've done it many times before without 
> any problems.
> 
> --
> 
> Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:  
>   Forwarding cib_apply_diff operation for section 'all' to master 
> (origin=local/cibadmin/2)
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> --- 0.605.2 2
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> +++ 0.607.0 65654c97e62cd549f22f777a5290fe3a
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
> /cib:  @epoch=607, @num_updates=0
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/resources:   type="mysql_745"/>
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/resources:   type="mysql_746"/>
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/constraints/rsc_colocation[@id='c_clust19']/resource_set[@id='c_clust19-0']:
>   
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/constraints/rsc_colocation[@id='c_clust19']/resource_set[@id='c_clust19-0']:
>   
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/constraints/rsc_order[@id='o_clust19']/resource_set[@id='o_clust19-3']:
>   
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/constraints/rsc_order[@id='o_clust19']/resource_set[@id='o_clust19-3']:
>   
> Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:  
>   Completed cib_apply_diff operation for section 'all': OK (rc=0, 
> origin=ha14a/cibadmin/2, version=0.607.0)
> Apr 03 08:50:30 [22762] ha14acib: info: write_cib_contents:   
>   Archived previous version as /var/lib/pacemaker/cib/cib-36.raw
> Apr 03 08:50:30 [22762] ha14acib: info: write_cib_contents:   
>   Wrote version 0.607.0 of the CIB to disk (digest: 
> 1afdb9e480f870a095aa9e39719d29c4)
> Apr 03 08:50:30 [22762] ha14acib: info: retrieveCib:
> Reading cluster configuration from: /var/lib/pacemaker/cib/cib.DkIgSs 
> (digest: /var/lib/pacemaker/cib/cib.hPwa66)
> Apr 03 08:50:30 [22764] ha14a   lrmd: info: 
> process_lrmd_get_rsc_info:  Resource 'p_mysql_745' not found (17 active 
> resources)
> Apr 03 08:50:30 [22764] ha14a   lrmd: info: 
> process_lrmd_rsc_register:  Added 'p_mysql_745' to the rsc list (18 active 
> resources)
> Apr 03 08:50:30 [22767] ha14a   crmd: info: do_lrm_rsc_op:  
> Performing key=10:7484:7:91ef4b03-8769-47a1-a364-060569c46e52 
> op=p_mysql_745_monitor_0
> Apr 03 08:50:30 [22764] ha14a   lrmd: info: 
> process_lrmd_get_rsc_info:  Resource 'p_mysql_746' not found (18 active 
> resources)
> Apr 03 08:50:30 [22764] ha14a   lrmd: info: 
> process_lrmd_rsc_register:  Added 'p_mysql_746' to the rsc list (19 active 
> resources)
> Apr 03 08:50:30 [22767] ha14a   crmd: info: do_lrm_rsc_op:  
> Performing key=11:7484:7:91ef4b03-8769-47a1-a364-060569c46e52 
> op=p_mysql_746_monitor_0
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> --- 0.607.0 2
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> +++ 0.607.1 (null)
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
> /cib:  @num_updates=1
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/status/node_state[@id='ha14b']/lrm[@id='ha14b']/lrm_resources:  
> 
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++
> 
>  operation="monitor" crm-debug-origin="do_update_resource" 
> crm_feature_set="3.0.9" 
> transition-key="13:7484:7:91ef4b03-8769-47a1-a364-060569c46e52" 
> transition-magic="0:7;13:7484:7:91ef4b03-8769-47a1-a364-060569c46e52" 
> call-id="142" rc-code="7" op-status="0" interval="0" last-run="1491234630" las
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++
>   
> 
> Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:  
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=ha14b/crmd/7665, version=0.607.1)
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> --- 0.607.1 2
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> +++ 0.607.2 (null)
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
> /cib:  @num_updates=2
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> 

Re: [ClusterLabs] [Problem] The crmd causes an error of xml.

2017-04-07 Thread Ken Gaillot
On 04/06/2017 08:49 AM, renayama19661...@ybb.ne.jp wrote:
> Hi All,
> 
> I confirmed a development edition of Pacemaker.
>  - 
> https://github.com/ClusterLabs/pacemaker/tree/71dbd128c7b0a923c472c8e564d33a0ba1816cb5
> 
> 
> property no-quorum-policy="ignore" \
> stonith-enabled="true" \
> startup-fencing="false"
> 
> rsc_defaults resource-stickiness="INFINITY" \
> migration-threshold="INFINITY"
> 
> fencing_topology \
> rh73-01-snmp: prmStonith1-1 \
> rh73-02-snmp: prmStonith2-1
> 
> primitive prmDummy ocf:pacemaker:Dummy \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op monitor interval="10s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="fence"
> 
> primitive prmStonith1-1 stonith:external/ssh \
> params \
> pcmk_reboot_retries="1" \
> pcmk_reboot_timeout="40s" \
> hostlist="rh73-01-snmp" \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="ignore"
> 
> primitive prmStonith2-1 stonith:external/ssh \
> params \
> pcmk_reboot_retries="1" \
> pcmk_reboot_timeout="40s" \
> hostlist="rh73-02-snmp" \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="ignore"
> 
> ### Resource Location ###
> location rsc_location-1 prmDummy \
> rule  300: #uname eq rh73-01-snmp \
> rule  200: #uname eq rh73-02-snmp
> 
> 
> 
> I pour the following brief crm files.
> I produce the trouble of the resource in a cluster.
> Then crmd causes an error.
> 
> 
> (snip)
> Apr  6 18:04:22 rh73-01-snmp pengine[5214]: warning: Calculated transition 4 
> (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-0.bz2
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: Entity: line 1: 
> parser error : Specification mandate value for attribute 
> CRM_meta_fail_count_prmDummy
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: rh73-01-snmp" 
> on_node_uuid="3232238265"> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error:  
>   ^
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: Entity: line 1: 
> parser error : attributes construct error
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: rh73-01-snmp" 
> on_node_uuid="3232238265"> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error:  
>   ^
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: Entity: line 1: 
> parser error : Couldn't find end of Start Tag attributes line 1
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: rh73-01-snmp" 
> on_node_uuid="3232238265"> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error:  
>   ^
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]: warning: Parsing failed (domain=1, 
> level=3, code=73): Couldn't find end of Start Tag attributes line 1
> (snip)
> 
> 
> The XML that a new trouble count was related to somehow or other seems to 
> have a problem.
> 
> I attach pe-warn-0.bz2.
> 
> Best Regards,
> Hideo Yamauchi.

Hi Hideo,

Thanks for the report!

This appears to be a PE bug when fencing is needed due to stop failure.
It wasn't caught in regression testing because the PE will continue to
use the old-style fail-count attribute if the DC does not support the
new style, and existing tests obviously have older DCs. I definitely
need to add some new tests.

I'm not sure why fail-count and last-failure are being added as
meta-attributes in this case, or why incorrect XML syntax is being
generated, but I'll investigate.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [ClusterLabs Developers] checking all procs on system enough during stop action?

2017-04-24 Thread Ken Gaillot
On 04/24/2017 10:32 AM, Jehan-Guillaume de Rorthais wrote:
> On Mon, 24 Apr 2017 17:08:15 +0200
> Lars Ellenberg  wrote:
> 
>> On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais wrote:
>>> Hi all,
>>>
>>> In the PostgreSQL Automatic Failover (PAF) project, one of most frequent
>>> negative feedback we got is how difficult it is to experience with it
>>> because of fencing occurring way too frequently. I am currently hunting
>>> this kind of useless fencing to make life easier.
>>>
>>> It occurs to me, a frequent reason of fencing is because during the stop
>>> action, we check the status of the PostgreSQL instance using our monitor
>>> function before trying to stop the resource. If the function does not return
>>> OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise an error,
>>> leading to a fencing. See:
>>> https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301
>>>
>>> I am considering adding a check to define if the instance is stopped even
>>> if the monitor action returns an error. The idea would be to parse **all**
>>> the local processes looking for at least one pair of
>>> "/proc//{comm,cwd}" related to the PostgreSQL instance we want to
>>> stop. If none are found, we consider the instance is not running.
>>> Gracefully or not, we just know it is down and we can return OCF_SUCCESS.
>>>
>>> Just for completeness, the piece of code would be:
>>>
>>>my @pids;
>>>foreach my $f (glob "/proc/[0-9]*") {
>>>push @pids => basename($f)
>>>if -r $f
>>>and basename( readlink( "$f/exe" ) ) eq "postgres"
>>>and readlink( "$f/cwd" ) eq $pgdata;
>>>}
>>>
>>> I feels safe enough to me. The only risk I could think of is in a shared
>>> disk cluster with multiple nodes accessing the same data in RW (such setup
>>> can fail in so many ways :)). However, PAF is not supposed to work in such
>>> context, so I can live with this.
>>>
>>> Do you guys have some advices? Do you see some drawbacks? Hazards?  
>>
>> Isn't that the wrong place to "fix" it?
>> Why did your _monitor  return something "weird"?
> 
> Because this _monitor is the one called by the monitor action. It is able to
> define if an instance is running and if it feels good.
> 
> Take the scenario where the slave instance is crashed:
>   1/ the monitor action raise an OCF_ERR_GENERIC
>   2/ Pacemaker tries a recover of the resource (stop->start)
>   3/ the stop action fails because _monitor says the resource is crashed
>   4/ Pacemaker fence the node.
> 
>> What did it return?
> 
> Either OCF_ERR_GENERIC or OCF_FAILED_MASTER as instance.
> 
>> Should you not fix it there?
> 
> fixing this in the monitor action? This would bloat the code of this function.
> We would have to add a special code path in there to define if it is called
> as a real monitor action or just as a status one for other actions.
> 
> But anyway, here or there, I would have to add this piece of code looking at
> each processes. According to you, is it safe enough? Do you see some hazard
> with it?
> 
>> Just thinking out loud.
> 
> Thank you, it helps :)

It feels odd that there is a situation where monitor should return an
error (instead of "not running"), but stop should return OK.

I think the question is whether the service can be considered cleanly
stopped at that point -- i.e. whether it's safe for another node to
become master, and safe to try starting the crashed service again on the
same node.

If it's cleanly stopped, the monitor should probably return "not
running". Pacemaker will already compare that result against the
expected state, and recover appropriately if needed.

The PID check assumes there can only be one instance of postgresql on
the machine. If there are instances bound to different IPs, or some user
starts a private instance, it could be inaccurate. But that would err on
the side of fencing, so it might still be useful, if you don't have a
way of more narrowly identifying the expected instance.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Two nodes cluster issue

2017-07-31 Thread Ken Gaillot
On Mon, 2017-07-24 at 11:51 +, Tomer Azran wrote:
> Hello,
> 
>  
> 
> We built a pacemaker cluster with 2 physical servers.
> 
> We configured DRBD in Master\Slave setup, a floating IP and file
> system mount in Active\Passive mode.
> 
> We configured two STONITH devices (fence_ipmilan), one for each
> server.
> 
>  
> 
> We are trying to simulate a situation when the Master server crushes
> with no power. 
> 
> We pulled both of the PSU cables and the server becomes offline
> (UNCLEAN).
> 
> The resources that the Master use to hold are now in Started (UNCLEAN)
> state.
> 
> The state is unclean since the STONITH failed (the STONITH device is
> located on the server (Intel RMM4 - IPMI) – which uses the same power
> supply). 
> 
>  
> 
> The problem is that now, the cluster does not releasing the resources
> that the Master holds, and the service goes down.
> 
>  
> 
> Is there any way to overcome this situation? 
> 
> We tried to add a qdevice but got the same results.
> 
>  
> 
> We are using pacemaker 1.1.15 on CentOS 7.3
> 
>  
> 
> Thanks,
> 
> Tomer.

This is a limitation of using IPMI as the only fence device, when the
IPMI shares power with the main system. The way around it is to use a
fallback fence device, for example a switched power unit or sbd
(watchdog). Pacemaker lets you specify a fencing "topology" with
multiple devices -- level 1 would be the IPMI, and level 2 would be the
fallback device.

qdevice helps with quorum, which would let one side attempt to fence the
other, but it doesn't affect whether the fencing succeeds. With a
two-node cluster, you can use qdevice to get quorum, or you can use
corosync's two_node option.

-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Two nodes cluster issue

2017-07-31 Thread Ken Gaillot
Please ignore my re-reply to the original message, I'm in the middle of
a move and am getting by on little sleep at the moment :-)

On Mon, 2017-07-31 at 09:26 -0500, Ken Gaillot wrote:
> On Mon, 2017-07-24 at 11:51 +, Tomer Azran wrote:
> > Hello,
> > 
> >  
> > 
> > We built a pacemaker cluster with 2 physical servers.
> > 
> > We configured DRBD in Master\Slave setup, a floating IP and file
> > system mount in Active\Passive mode.
> > 
> > We configured two STONITH devices (fence_ipmilan), one for each
> > server.
> > 
> >  
> > 
> > We are trying to simulate a situation when the Master server crushes
> > with no power. 
> > 
> > We pulled both of the PSU cables and the server becomes offline
> > (UNCLEAN).
> > 
> > The resources that the Master use to hold are now in Started (UNCLEAN)
> > state.
> > 
> > The state is unclean since the STONITH failed (the STONITH device is
> > located on the server (Intel RMM4 - IPMI) – which uses the same power
> > supply). 
> > 
> >  
> > 
> > The problem is that now, the cluster does not releasing the resources
> > that the Master holds, and the service goes down.
> > 
> >  
> > 
> > Is there any way to overcome this situation? 
> > 
> > We tried to add a qdevice but got the same results.
> > 
> >  
> > 
> > We are using pacemaker 1.1.15 on CentOS 7.3
> > 
> >  
> > 
> > Thanks,
> > 
> > Tomer.
> 
> This is a limitation of using IPMI as the only fence device, when the
> IPMI shares power with the main system. The way around it is to use a
> fallback fence device, for example a switched power unit or sbd
> (watchdog). Pacemaker lets you specify a fencing "topology" with
> multiple devices -- level 1 would be the IPMI, and level 2 would be the
> fallback device.
> 
> qdevice helps with quorum, which would let one side attempt to fence the
> other, but it doesn't affect whether the fencing succeeds. With a
> two-node cluster, you can use qdevice to get quorum, or you can use
> corosync's two_node option.
> 

-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: from where does the default value for start/stop op of a resource come ?

2017-08-02 Thread Ken Gaillot
On Wed, 2017-08-02 at 18:32 +0200, Lentes, Bernd wrote:
> 
> - On Aug 2, 2017, at 10:42 AM, Ulrich Windl 
> ulrich.wi...@rz.uni-regensburg.de wrote:
> 
> 
> > 
> > I thought the cluster does not perform actions that are not defined in the
> > configuration (e.g. "monitor"). 
> 
> I think the cluster performs and configures automatically start/stop 
> operations if not defined it the resource, 
> but a monitor op has to be configured explicitly, to my knowledge.

Correct. We've considered adding an implicit monitor if none is
explicitly specified, as well as adding implicit master and slave role
monitors if only one monitor is specified for a master/slave resource.
That might happen in a future version.

> >> 2. Set timeouts for any operations that have defaults in the RA
> >>meta-data?
> >> 
> >> What most people seem to expect is 2), but it sounds like what you are
> >> expecting is 1). Crmsh can't read minds, so it would have to pick one or
> >> the other.
> 
> Yes, i expected the cluster chooses the "defaults" from the meta-data of the 
> RA..

It is confusing. Pacemaker doesn't use much from the resource agent
meta-data currently. I could see an argument for using the RA defaults,
though it could still be confusing since there are multiple possible
interpretations.

The implementation would be complicated, though. Currently, only the
crmd has the meta-data information; it's not in the CIB, so the policy
engine (which sets the timeouts) doesn't have it. Also, we can schedule
probe, start, and monitor operations in a single transition, before
we've gotten the RA meta-data, so the timeouts couldn't be known when
the actions are scheduled. There are potential ways around that, but it
would be a significant project.

> >> Another thing to consider is that if RA meta-data is preferred over the
> >> global default timeout, then the global default timeout wouldn't be used
> >> at all for operations that happen to have default timeouts in the
> >> meta-data. That seems surprising as well to me.
> 
> Yes. You configure global defaults, but they are not used. Confusing.
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671






___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Updated attribute is not displayed in crm_mon

2017-08-15 Thread Ken Gaillot
On Tue, 2017-08-15 at 08:42 +0200, Jan Friesse wrote:
> Ken Gaillot napsal(a):
> > On Mon, 2017-08-14 at 12:33 -0500, Ken Gaillot wrote:
> >> On Wed, 2017-08-02 at 09:59 +, 井上 和徳 wrote:
> >>> Hi,
> >>>
> >>> In Pacemaker-1.1.17, the attribute updated while starting pacemaker is 
> >>> not displayed in crm_mon.
> >>> In Pacemaker-1.1.16, it is displayed and results are different.
> >>>
> >>> https://github.com/ClusterLabs/pacemaker/commit/fe44f400a3116a158ab331a92a49a4ad8937170d
> >>> This commit is the cause, but the following result (3.) is expected 
> >>> behavior?
> >>
> >> This turned out to be an odd one. The sequence of events is:
> >>
> >> 1. When the node leaves the cluster, the DC (correctly) wipes all its
> >> transient attributes from attrd and the CIB.
> >>
> >> 2. Pacemaker is newly started on the node, and a transient attribute is
> >> set before the node joins the cluster.
> >>
> >> 3. The node joins the cluster, and its transient attributes (including
> >> the new value) are sync'ed with the rest of the cluster, in both attrd
> >> and the CIB. So far, so good.
> >>
> >> 4. Because this is the node's first join since its crmd started, its
> >> crmd wipes all of its transient attributes again. The idea is that the
> >> node may have restarted so quickly that the DC hasn't yet done it (step
> >> 1 here), so clear them now to avoid any problems with old values.
> >> However, the crmd wipes only the CIB -- not attrd (arguably a bug).
> >
> > Whoops, clarification: the node may have restarted so quickly that
> > corosync didn't notice it left, so the DC would never have gotten the
> 
> Corosync always notice left of node no matter if left is longer or 
> within token timeout.

Looking back at the original commit, it has a comment "OpenAIS has a
nasty habit of not being able to tell if a node is returning or didn't
leave in the first place", so it looks like it's only relevant on legacy
stacks.

> 
> > "peer lost" message that triggers wiping its transient attributes.
> >
> > I suspect the crmd wipes only the CIB in this case because we assumed
> > attrd would be empty at this point -- missing exactly this case where a
> > value was set between start-up and first join.
> >
> >> 5. With the older pacemaker version, both the joining node and the DC
> >> would request a full write-out of all values from attrd. Because step 4
> >> only wiped the CIB, this ends up restoring the new value. With the newer
> >> pacemaker version, this step is no longer done, so the value winds up
> >> staying in attrd but not in CIB (until the next write-out naturally
> >> occurs).
> >>
> >> I don't have a solution yet, but step 4 is clearly the problem (rather
> >> than the new code that skips step 5, which is still a good idea
> >> performance-wise). I'll keep working on it.
> >>
> >>> [test case]
> >>> 1. Start pacemaker on two nodes at the same time and update the attribute 
> >>> during startup.
> >>> In this case, the attribute is displayed in crm_mon.
> >>>
> >>> [root@node1 ~]# ssh -f node1 'systemctl start pacemaker ; 
> >>> attrd_updater -n KEY -U V-1' ; \
> >>> ssh -f node3 'systemctl start pacemaker ; 
> >>> attrd_updater -n KEY -U V-3'
> >>> [root@node1 ~]# crm_mon -QA1
> >>> Stack: corosync
> >>> Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with 
> >>> quorum
> >>>
> >>> 2 nodes configured
> >>> 0 resources configured
> >>>
> >>> Online: [ node1 node3 ]
> >>>
> >>> No active resources
> >>>
> >>>
> >>> Node Attributes:
> >>> * Node node1:
> >>> + KEY   : V-1
> >>> * Node node3:
> >>> + KEY   : V-3
> >>>
> >>>
> >>> 2. Restart pacemaker on node1, and update the attribute during startup.
> >>>
> >>> [root@node1 ~]# systemctl stop pacemaker
> >>> [root@node1 ~]# systemctl start pacemaker ; attrd_updater -n KEY -U 
> >>> V-10
> >>>
> >>>
> >>> 3. The attribute is registered in attrd but it is not registered in CIB,
> >>> so

Re: [ClusterLabs] Updated attribute is not displayed in crm_mon

2017-08-16 Thread Ken Gaillot
I have a fix for this issue ready. I am running some tests on it, then
will merge it in the upstream master branch, to become part of the next
release.

The fix is to clear the transient attributes from the CIB when attrd
starts, rather than when the crmd completes its first join. This
eliminates the window where attributes can be set before the CIB is
cleared.

On Tue, 2017-08-15 at 08:42 +, 井上 和徳 wrote:
> Hi Ken,
> 
> Thanks for the explanation.
> 
> As an additional information, we are using Daemon(*1) that registers
> Corosync's ring status as attributes, so I want to avoid events where
> attributes are not displayed.
> 
> *1 It's a ifcheckd that always running, not a resource. and registers
>attributes when Pacemaker is running.
>( https://github.com/linux-ha-japan/pm_extras/tree/master/tools )
>Attribute example :
> 
>Node Attributes:
>* Node rhel73-1:
>+ ringnumber_0  : 192.168.101.131 is UP
>+ ringnumber_1  : 192.168.102.131 is UP
>* Node rhel73-2:
>+ ringnumber_0  : 192.168.101.132 is UP
>+ ringnumber_1  : 192.168.102.132 is UP
> 
> Regards,
> Kazunori INOUE
> 
> > -Original Message-
> > From: Ken Gaillot [mailto:kgail...@redhat.com]
> > Sent: Tuesday, August 15, 2017 2:42 AM
> > To: Cluster Labs - All topics related to open-source clustering welcomed
> > Subject: Re: [ClusterLabs] Updated attribute is not displayed in crm_mon
> > 
> > On Mon, 2017-08-14 at 12:33 -0500, Ken Gaillot wrote:
> > > On Wed, 2017-08-02 at 09:59 +, 井上 和徳 wrote:
> > > > Hi,
> > > >
> > > > In Pacemaker-1.1.17, the attribute updated while starting pacemaker is 
> > > > not displayed in crm_mon.
> > > > In Pacemaker-1.1.16, it is displayed and results are different.
> > > >
> > > > https://github.com/ClusterLabs/pacemaker/commit/fe44f400a3116a158ab331a92a49a4ad8937170d
> > > > This commit is the cause, but the following result (3.) is expected 
> > > > behavior?
> > >
> > > This turned out to be an odd one. The sequence of events is:
> > >
> > > 1. When the node leaves the cluster, the DC (correctly) wipes all its
> > > transient attributes from attrd and the CIB.
> > >
> > > 2. Pacemaker is newly started on the node, and a transient attribute is
> > > set before the node joins the cluster.
> > >
> > > 3. The node joins the cluster, and its transient attributes (including
> > > the new value) are sync'ed with the rest of the cluster, in both attrd
> > > and the CIB. So far, so good.
> > >
> > > 4. Because this is the node's first join since its crmd started, its
> > > crmd wipes all of its transient attributes again. The idea is that the
> > > node may have restarted so quickly that the DC hasn't yet done it (step
> > > 1 here), so clear them now to avoid any problems with old values.
> > > However, the crmd wipes only the CIB -- not attrd (arguably a bug).
> > 
> > Whoops, clarification: the node may have restarted so quickly that
> > corosync didn't notice it left, so the DC would never have gotten the
> > "peer lost" message that triggers wiping its transient attributes.
> > 
> > I suspect the crmd wipes only the CIB in this case because we assumed
> > attrd would be empty at this point -- missing exactly this case where a
> > value was set between start-up and first join.
> > 
> > > 5. With the older pacemaker version, both the joining node and the DC
> > > would request a full write-out of all values from attrd. Because step 4
> > > only wiped the CIB, this ends up restoring the new value. With the newer
> > > pacemaker version, this step is no longer done, so the value winds up
> > > staying in attrd but not in CIB (until the next write-out naturally
> > > occurs).
> > >
> > > I don't have a solution yet, but step 4 is clearly the problem (rather
> > > than the new code that skips step 5, which is still a good idea
> > > performance-wise). I'll keep working on it.
> > >
> > > > [test case]
> > > > 1. Start pacemaker on two nodes at the same time and update the 
> > > > attribute during startup.
> > > >In this case, the attribute is displayed in crm_mon.
> > > >
> > > >[root@node1 ~]# ssh -f node1 'systemctl start pacemaker ; 
> > > > attrd_updater -n KEY -U V-1' ; \
> > > >ssh -f node3 's

Re: [ClusterLabs] IPaddr2 RA and bonding

2017-08-10 Thread Ken Gaillot
On Thu, 2017-08-10 at 11:02 +, Tomer Azran wrote:
> That looks exactly what I needed - it works.
> I had to change the RA since I don't want to give an interface name as a 
> parameter (it might change from server to server and I want to create a 
> cloned resource).
> I changed the RA a little bit to be able to guess the interface name based on 
> a IP address parameter.
> The new RA is published on my github repo: 
> https://github.com/tomerazran/Pacemaker-Resource-Agents/blob/master/ipspeed 

Nice! Feel free to open a PR against the ClusterLabs/pacemaker
repository with your changes. You could make it so the user has to
specify one of iface or ip, or you could have another parameter
iface_from_ip=true/false and put the IP in iface.

> Just to document the solution in case anyone will need it also, I run the 
> following setup:
> 
>  # pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.1.3 op monitor 
> interval=30
>  # pcs resource create vip_speed ocf:heartbeat:ipspeed ip=192.168.1.3 
> name=vip_speed op monitor interval=5s --clone
>  # pcs constraint location vip rule score=-INFINITY vip_speed lt 1 or 
> not_defined vip_speed
> 
> Thank you for the support,
> Tomer.
> 
> 
> -Original Message-
> From: Vladislav Bogdanov [mailto:bub...@hoster-ok.com] 
> Sent: Monday, August 7, 2017 9:22 PM
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] IPaddr2 RA and bonding
> 
> 07.08.2017 20:39, Tomer Azran wrote:
> > I don't want to use this approach since I don't want to be depend on 
> > pinging to other host or couple of hosts.
> > Is there any other solution?
> > I'm thinking of writing a simple script that will take a bond down 
> > using ifdown command when there are no slaves available and put it on 
> > /sbin/ifdown-local
> 
> For the similar purpose I wrote and use this one - 
> https://github.com/ClusterLabs/pacemaker/blob/master/extra/resources/ifspeed
> 
> It sets a node attribute on which other resources may depend via location 
> constraint  - 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch08.html#ch-rules
> 
> It is not installed by default, and that should probably be fixed.
> 
> That RA supports bonds (and bridges), and even tries to guess actual 
> resulting bond speed based on a bond type. For load-balancing bonds like LACP 
> (mode 4) one it uses coefficient of 0.8 (iirc) to reflect actual possible 
> load via multiple links.
> 
> >
> >
> > -Original Message-
> > From: Ken Gaillot [mailto:kgail...@redhat.com]
> > Sent: Monday, August 7, 2017 7:14 PM
> > To: Cluster Labs - All topics related to open-source clustering 
> > welcomed <users@clusterlabs.org>
> > Subject: Re: [ClusterLabs] IPaddr2 RA and bonding
> >
> > On Mon, 2017-08-07 at 10:02 +, Tomer Azran wrote:
> >> Hello All,
> >>
> >>
> >>
> >> We are using CentOS 7.3 with pacemaker in order to create a cluster.
> >>
> >> Each cluster node ha a bonding interface consists of two nics.
> >>
> >> The cluster has an IPAddr2 resource configured like that:
> >>
> >>
> >>
> >> # pcs resource show cluster_vip
> >>
> >> Resource: cluster_vip (class=ocf provider=heartbeat type=IPaddr2)
> >>
> >>   Attributes: ip=192.168.1.3
> >>
> >>   Operations: start interval=0s timeout=20s (cluster_vip
> >> -start-interval-0s)
> >>
> >>   stop interval=0s timeout=20s (cluster_vip
> >> -stop-interval-0s)
> >>
> >>   monitor interval=30s (cluster_vip 
> >> -monitor-interval-30s)
> >>
> >>
> >>
> >>
> >>
> >> We are running tests and want to simulate a state when the network 
> >> links are down.
> >>
> >> We are pulling both network cables from the server.
> >>
> >>
> >>
> >> The problem is that the resource is not marked as failed, and the 
> >> faulted node keep holding it and does not fail it over to the other 
> >> node.
> >>
> >> I think that the problem is within the bond interface. The bond 
> >> interface is marked as UP on the OS. It even can ping itself:
> >>
> >>
> >>
> >> # ip link show
> >>
> >> 2: eno3: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq 
> >> master bond1 state DOWN mode DEFAULT qlen 1000
> >>
> >> link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff
> >>
> >> 3: eno4: <NO-CARRIER,BROADCAST

Re: [ClusterLabs] notify action is not called for the docker bundle resources

2017-08-09 Thread Ken Gaillot
at:redis): Master
> overcloud-controller-0
>tredis-bundle-1 (ocf::heartbeat:redis): Slave
> overcloud-controller-1
>tredis-bundle-2 (ocf::heartbeat:redis): Slave
> overcloud-controller-2
> 
> 
> 
> 
> contents of /var/lib/kolla/config_files/redis.json
> 
> {"config_files": [{"dest": "/etc/libqb/force-filesystem-sockets",
> "owner": "root", "perm": "0644", "source": "/dev/null"}, {"dest": "/",
> "merge": true, "optional": true, "source":
> "/var/lib/kolla/config_files/src/*", "preserve_properties": true}],
> "command": "/usr/sbin/pacemaker_remoted", "permissions": [{"owner":
> "redis:redis", "path": "/var/run/redis", "recurse": true}, {"owner":
> "redis:redis", "path": "/var/lib/redis", "recurse": true}, {"owner":
> "redis:redis", "path": "/var/log/redis", "recurse": true}]}
> 
> 
> 
> Please note the docker image for redis can be pulled as
> "docker pull tripleoupstream/centos-binary-redis"
> 
> 
> 
> 
> [2] - https://github.com/numansiddique/pcs_logs
> 
> 
> [3]
> - 
> https://github.com/openvswitch/ovs/blob/master/ovn/utilities/ovndb-servers.ocf
> 
> 
> 
> 
> 
> 
> Thanks
> Numan
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Updated attribute is not displayed in crm_mon

2017-08-14 Thread Ken Gaillot
On Wed, 2017-08-02 at 09:59 +, 井上 和徳 wrote:
> Hi,
> 
> In Pacemaker-1.1.17, the attribute updated while starting pacemaker is not 
> displayed in crm_mon.
> In Pacemaker-1.1.16, it is displayed and results are different.
> 
> https://github.com/ClusterLabs/pacemaker/commit/fe44f400a3116a158ab331a92a49a4ad8937170d
> This commit is the cause, but the following result (3.) is expected behavior?

This turned out to be an odd one. The sequence of events is:

1. When the node leaves the cluster, the DC (correctly) wipes all its
transient attributes from attrd and the CIB.

2. Pacemaker is newly started on the node, and a transient attribute is
set before the node joins the cluster.

3. The node joins the cluster, and its transient attributes (including
the new value) are sync'ed with the rest of the cluster, in both attrd
and the CIB. So far, so good.

4. Because this is the node's first join since its crmd started, its
crmd wipes all of its transient attributes again. The idea is that the
node may have restarted so quickly that the DC hasn't yet done it (step
1 here), so clear them now to avoid any problems with old values.
However, the crmd wipes only the CIB -- not attrd (arguably a bug).

5. With the older pacemaker version, both the joining node and the DC
would request a full write-out of all values from attrd. Because step 4
only wiped the CIB, this ends up restoring the new value. With the newer
pacemaker version, this step is no longer done, so the value winds up
staying in attrd but not in CIB (until the next write-out naturally
occurs).

I don't have a solution yet, but step 4 is clearly the problem (rather
than the new code that skips step 5, which is still a good idea
performance-wise). I'll keep working on it.

> [test case]
> 1. Start pacemaker on two nodes at the same time and update the attribute 
> during startup.
>In this case, the attribute is displayed in crm_mon.
> 
>[root@node1 ~]# ssh -f node1 'systemctl start pacemaker ; attrd_updater -n 
> KEY -U V-1' ; \
>ssh -f node3 'systemctl start pacemaker ; attrd_updater -n 
> KEY -U V-3'
>[root@node1 ~]# crm_mon -QA1
>Stack: corosync
>Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum
> 
>2 nodes configured
>0 resources configured
> 
>Online: [ node1 node3 ]
> 
>No active resources
> 
> 
>Node Attributes:
>* Node node1:
>+ KEY   : V-1
>* Node node3:
>+ KEY   : V-3
> 
> 
> 2. Restart pacemaker on node1, and update the attribute during startup.
> 
>[root@node1 ~]# systemctl stop pacemaker
>[root@node1 ~]# systemctl start pacemaker ; attrd_updater -n KEY -U V-10
> 
> 
> 3. The attribute is registered in attrd but it is not registered in CIB,
>so the updated attribute is not displayed in crm_mon.
> 
>[root@node1 ~]# attrd_updater -Q -n KEY -A
>name="KEY" host="node3" value="V-3"
>name="KEY" host="node1" value="V-10"
> 
>[root@node1 ~]# crm_mon -QA1
>Stack: corosync
>Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum
> 
>2 nodes configured
>0 resources configured
> 
>Online: [ node1 node3 ]
> 
>No active resources
> 
> 
>Node Attributes:
>* Node node1:
>* Node node3:
>+ KEY   : V-3
> 
> 
> Best Regards
> 
> _______
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Updated attribute is not displayed in crm_mon

2017-08-14 Thread Ken Gaillot
On Mon, 2017-08-14 at 12:33 -0500, Ken Gaillot wrote:
> On Wed, 2017-08-02 at 09:59 +, 井上 和徳 wrote:
> > Hi,
> > 
> > In Pacemaker-1.1.17, the attribute updated while starting pacemaker is not 
> > displayed in crm_mon.
> > In Pacemaker-1.1.16, it is displayed and results are different.
> > 
> > https://github.com/ClusterLabs/pacemaker/commit/fe44f400a3116a158ab331a92a49a4ad8937170d
> > This commit is the cause, but the following result (3.) is expected 
> > behavior?
> 
> This turned out to be an odd one. The sequence of events is:
> 
> 1. When the node leaves the cluster, the DC (correctly) wipes all its
> transient attributes from attrd and the CIB.
> 
> 2. Pacemaker is newly started on the node, and a transient attribute is
> set before the node joins the cluster.
> 
> 3. The node joins the cluster, and its transient attributes (including
> the new value) are sync'ed with the rest of the cluster, in both attrd
> and the CIB. So far, so good.
> 
> 4. Because this is the node's first join since its crmd started, its
> crmd wipes all of its transient attributes again. The idea is that the
> node may have restarted so quickly that the DC hasn't yet done it (step
> 1 here), so clear them now to avoid any problems with old values.
> However, the crmd wipes only the CIB -- not attrd (arguably a bug).

Whoops, clarification: the node may have restarted so quickly that
corosync didn't notice it left, so the DC would never have gotten the
"peer lost" message that triggers wiping its transient attributes.

I suspect the crmd wipes only the CIB in this case because we assumed
attrd would be empty at this point -- missing exactly this case where a
value was set between start-up and first join.

> 5. With the older pacemaker version, both the joining node and the DC
> would request a full write-out of all values from attrd. Because step 4
> only wiped the CIB, this ends up restoring the new value. With the newer
> pacemaker version, this step is no longer done, so the value winds up
> staying in attrd but not in CIB (until the next write-out naturally
> occurs).
> 
> I don't have a solution yet, but step 4 is clearly the problem (rather
> than the new code that skips step 5, which is still a good idea
> performance-wise). I'll keep working on it.
> 
> > [test case]
> > 1. Start pacemaker on two nodes at the same time and update the attribute 
> > during startup.
> >In this case, the attribute is displayed in crm_mon.
> > 
> >[root@node1 ~]# ssh -f node1 'systemctl start pacemaker ; attrd_updater 
> > -n KEY -U V-1' ; \
> >ssh -f node3 'systemctl start pacemaker ; attrd_updater 
> > -n KEY -U V-3'
> >[root@node1 ~]# crm_mon -QA1
> >Stack: corosync
> >Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum
> > 
> >2 nodes configured
> >0 resources configured
> > 
> >Online: [ node1 node3 ]
> > 
> >No active resources
> > 
> > 
> >Node Attributes:
> >* Node node1:
> >+ KEY   : V-1
> >* Node node3:
> >+ KEY   : V-3
> > 
> > 
> > 2. Restart pacemaker on node1, and update the attribute during startup.
> > 
> >[root@node1 ~]# systemctl stop pacemaker
> >[root@node1 ~]# systemctl start pacemaker ; attrd_updater -n KEY -U V-10
> > 
> > 
> > 3. The attribute is registered in attrd but it is not registered in CIB,
> >so the updated attribute is not displayed in crm_mon.
> > 
> >[root@node1 ~]# attrd_updater -Q -n KEY -A
> >name="KEY" host="node3" value="V-3"
> >name="KEY" host="node1" value="V-10"
> > 
> >[root@node1 ~]# crm_mon -QA1
> >Stack: corosync
> >Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum
> > 
> >2 nodes configured
> >0 resources configured
> > 
> >Online: [ node1 node3 ]
> > 
> >    No active resources
> > 
> > 
> >Node Attributes:
> >* Node node1:
> >* Node node3:
> >+ KEY   : V-3
> > 
> > 
> > Best Regards
> > 
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 

-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] DRBD or SAN ?

2017-07-17 Thread Ken Gaillot
On 07/17/2017 04:51 AM, Lentes, Bernd wrote:
> Hi,
> 
> i established a two node cluster with two HP servers and SLES 11 SP4. I'd 
> like to start now with a test period. Resources are virtual machines. The 
> vm's reside on a FC SAN. The SAN has two power supplies, two storage 
> controller, two network interfaces for configuration. Each storage controller 
> has two FC connectors. On each server i have one FC controller with two 
> connectors in a multipath configuration. Each connector from the SAN 
> controller inside the server is connected to a different storage controller 
> from the SAN. But isn't a SAN, despite all that redundancy, a SPOF ?

What types of failure would be an example of a SPOF? Perhaps a single
logic board, or physical proximity so that environmental factors could
take down the whole thing.

> I'm asking myself if a DRBD configuration wouldn't be more redundant and high 
> available. There i have two completely independent instances of the vm.

I'd agree. However, you'd have to estimate the above risks of the SAN
approach, compare any other advantages/disadvantages, and decide whether
it's worth a redesign.

> We have one web application with a databse which is really crucial for us. 
> Downtime should be maximum one or two hours, if longer we run in trouble.

Another thing to consider is that the risks of SAN vs DRBD are probably
much less than the risks of a single data center. If you used booth to
set up 2-3 redundant clusters, it would be easier to accept small risks
at any one site. Of course, that may not be possible in your situation,
every case is different.

> Is DRBD in conjuction with a database (MySQL or Postgres) possible ?
> 
> 
> Bernd

I've always favored native replication over disk replication for
databases. I'm not sure that's a necessity, but I would think the
biggest advantage is that with disk replication, you couldn't run the
database server all the time, you'd have to start it (and its VM in your
case) after disk failover.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antwort: Antw: Antwort: Re: reboot node / cluster standby

2017-07-11 Thread Ken Gaillot
On 07/11/2017 07:34 AM, philipp.achmuel...@arz.at wrote:
> 
> 
> Mit freundlichen GrĂĽĂźen / best regards
> *
> Dipl.-Ing. (FH) Philipp AchmĂĽller*
> *
> ARZ Allgemeines Rechenzentrum GmbH*
> UNIX Systems
> 
> A-6020 Innsbruck, TschamlerstraĂźe 2
> Tel: +43 / (0)50 4009-1917 _
> __philipp.achmuel...@arz.at _ <mailto:philipp.achmuel...@arz.at>_
> __http://www.arz.at_ <http://www.arz.at/>
> Landes- als Handelsgericht Innsbruck, FN 38653v
> DVR: 0419427
> _
> _<http://www.arz.at/>
> 
> 
> 
> "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> schrieb am 06.07.2017
> 09:24:12:
> 
>> Von: "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de>
>> An: <users@clusterlabs.org>
>> Datum: 06.07.2017 09:28
>> Betreff: [ClusterLabs] Antw:  Antwort: Re:  reboot node / cluster standby
>>
>> >>> <philipp.achmuel...@arz.at> schrieb am 03.07.2017 um 15:30 in
> Nachricht
>> <of2758213a.f6dc56ee-onc1258152.0046de1e-c1258152.004a3...@arz.at>:
>> > Ken Gaillot <kgail...@redhat.com> schrieb am 29.06.2017 21:15:59:
>> >
>> >> Von: Ken Gaillot <kgail...@redhat.com>
>> >> An: Ludovic Vaugeois-Pepin <ludovi...@gmail.com>, Cluster Labs - All
>> >> topics related to open-source clustering welcomed
>> > <users@clusterlabs.org>
>> >> Datum: 29.06.2017 21:19
>> >> Betreff: Re: [ClusterLabs] reboot node / cluster standby
>> >>
>> >> On 06/29/2017 01:38 PM, Ludovic Vaugeois-Pepin wrote:
>> >> > On Thu, Jun 29, 2017 at 7:27 PM, Ken Gaillot <kgail...@redhat.com>
>> > wrote:
>> >> >> On 06/29/2017 04:42 AM, philipp.achmuel...@arz.at wrote:
>> >> >>> Hi,
>> >> >>>
>> >> >>> In order to reboot a Clusternode i would like to set the node to
>> > standby
>> >> >>> first, so a clean takeover for running resources can take in place.
>> >> >>> Is there a default way i can set in pacemaker, or do i have to
> setup
>> > my
>> >> >>> own systemd implementation?
>> >> >>>
>> >> >>> thank you!
>> >> >>> regards
>> >> >>> 
>> >> >>> env:
>> >> >>> Pacemaker 1.1.15
>> >> >>> SLES 12.2
>> >> >>
>> >> >> If a node cleanly shuts down or reboots, pacemaker will move all
>> >> >> resources off it before it exits, so that should happen as you're
>> >> >> describing, without needing an explicit standby.
>> >> >
>> >
>> > how does this work when evacuating e.g. 5 nodes out of a 10 node
> cluster
>> > at the same time?
>>
>> What is the command to to do that? If doing it sequentially, I'd
>> wait before the DC returns to IDLE state before starting the next
>> command. One rule of cluster is "be patient!" ;-)
>> [...]
> 
> on other cluster-software i used the standby function to free several
> Nodes from Resources in parallel and issued a distributed shutdown from
> my jumphost afterwards. when resource move is beeing initiated during
> server shutdown i think i have to do it sequential, or does pacemaker
> can handle shutdown command from several nodes parallel?
> 
>>
>> Regards,
>> Ulrich

Pacemaker can shutdown any number of nodes in parallel, though of course
if there is some time between each, there may be unnecessary resource
migrations as resources move to a node that is soon to be shut down
itself and so have to move again. If they are shut down within a few
seconds, Pacemaker will only have to move (or stop) resources once.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker 1.1.17 released

2017-07-06 Thread Ken Gaillot
ClusterLabs is proud to announce the latest release of the Pacemaker
cluster resource manager, version 1.1.17. The source code is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.17

The most significant enhancements in this release are:

* A new "bundle" resource type simplifies launching resources inside
Docker containers. This feature is considered experimental for this
release. It was discussed in detail previously:

  http://lists.clusterlabs.org/pipermail/users/2017-April/005380.html

A walk-through is available on the ClusterLabs wiki for anyone who wants
to experiment with the feature:

  http://wiki.clusterlabs.org/wiki/Bundle_Walk-Through

* A new environment variable PCMK_node_start_state can specify that a
node should start in standby mode. It was also discussed previously:

  http://lists.clusterlabs.org/pipermail/users/2017-April/005607.html

* The "crm_resource --cleanup" and "crm_failcount" commands can now
operate on a single operation type (previously, they could only operate
on all operations at once). This is part of an underlying switch to
tracking failure counts per operation, also discussed previously:

  http://lists.clusterlabs.org/pipermail/users/2017-April/005391.html

* Several command-line tools have new options, including "crm_resource
--validate" to run a resource agent's validate-all action,
"stonith_admin --list-targets" to list all potential targets of a fence
device, and "crm_attribute --pattern" to update or delete all node
attributes matching a regular expression

* The cluster's handling of fence failures has been improved. Among the
changes, a new "stonith-max-attempts" cluster option specifies how many
times fencing can fail for a target before the cluster will no longer
immediately re-attempt it (previously hard-coded at 10).

* The new release has scalability improvements for large clusters. Among
the changes, a new "cluster-ipc-limit" cluster option specifies how
large the IPC queue between pacemaker daemons can grow.

* Location constraints using rules may now compare a node attribute
against a resource parameter, using the new "value-source" field.
Previously, node attributes could only be compared against literal
values. This is most useful in combination with rsc-pattern to apply the
constraint to multiple resources.

As usual, to support the new features, the CRM feature set has been
incremented. This means that mixed-version clusters are supported only
during a rolling upgrade -- nodes with an older version will not be
allowed to rejoin once they shut down.

For a more detailed list of bug fixes and other changes, see the change log:

https://github.com/ClusterLabs/pacemaker/blob/1.1/ChangeLog

Many thanks to all contributors of source code to this release,
including Alexandra Zhuravleva, Andrew Beekhof, Aravind Kumar, Eric
Marques, Ferenc Wágner, Yan Gao, Hayley Swimelar, Hideo Yamauchi, Igor
Tsiglyar, Jan PokornĂ˝, Jehan-Guillaume de Rorthais, Ken Gaillot, Klaus
Wenninger, Kristoffer Grönlund, Michal Koutný, Nate Clark, Patrick
Hemmer, Sergey Mishin, Vladislav Bogdanov, and Yusuke Iida. Apologies if
I have overlooked anyone.
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] DRBD or SAN ?

2017-07-18 Thread Ken Gaillot
On 07/18/2017 09:34 AM, Lentes, Bernd wrote:
> 
> 
> - On Jul 17, 2017, at 11:51 AM, Bernd Lentes 
> bernd.len...@helmholtz-muenchen.de wrote:
> 
>> Hi,
>>
>> i established a two node cluster with two HP servers and SLES 11 SP4. I'd 
>> like
>> to start now with a test period. Resources are virtual machines. The vm's
>> reside on a FC SAN. The SAN has two power supplies, two storage controller, 
>> two
>> network interfaces for configuration. Each storage controller has two FC
>> connectors. On each server i have one FC controller with two connectors in a
>> multipath configuration. Each connector from the SAN controller inside the
>> server is connected to a different storage controller from the SAN. But 
>> isn't a
>> SAN, despite all that redundancy, a SPOF ?
>> I'm asking myself if a DRBD configuration wouldn't be more redundant and high
>> available. There i have two completely independent instances of the vm.
>> We have one web application with a databse which is really crucial for us.
>> Downtime should be maximum one or two hours, if longer we run in trouble.
>> Is DRBD in conjuction with a database (MySQL or Postgres) possible ?
>>
>>
>> Bernd
>>
> 
> Is with DRBD and Virtual Machines live migration possible ?

Yes, definitely. In the past, I've even live-migrated a VM with 24GB RAM
over 10gigE very quickly, with LVM over DRBD underneath.

> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] stonith disabled, but pacemaker tries to reboot

2017-07-20 Thread Ken Gaillot
On 07/20/2017 03:46 AM, Daniel.L wrote:
> Hi Pacemaker Users,
> 
> 
> We have a 2 node pacemaker cluster (v1.1.14).
> Stonith at this moment is disabled:
> 
> $ pcs property --all | grep stonith
> stonith-action: reboot
> stonith-enabled: false
> stonith-timeout: 60s
> stonith-watchdog-timeout: (null)
> 
> $ pcs property --all | grep fenc
> startup-fencing: true
> 
> 
> But when there is a network outage - it looks like pacemaker tries to
> restart the other node:
> 
> fence_pcmk[5739]: Requesting Pacemaker fence *node1* (reset)
> stonith-ng[31022]:   notice: Client stonith_admin.cman.xxx.
> wants to fence (reboot) '*node1*' with device '(any)'
> stonith-ng[31022]:   notice: Initiating remote operation reboot for
> *node1*: (0)
> stonith-ng[31022]:   notice: Couldn't find anyone to fence (reboot)
> *node1* with any device
> stonith-ng[31022]:   error: Operation reboot of *node1* by  for
> stonith_admin.cman.@xxx: No such device
> crmd[31026]:   notice: Peer *node1* was not terminated (reboot) by
>  for *node2*: No such device (ref=0) by
> client stonith_admin.cman.

stonith-enabled=false stops *Pacemaker* from requesting fencing, but it
doesn't stop external software from requesting fencing.

One hint in the logs is that the client starts with "stonith_admin"
which is the command-line tool that external apps can use to request
fencing.

Another hint is "fence_pcmk", which is not a Pacemaker fence agent, but
software that provides an interface to Pacemaker's fencing that CMAN can
understand. So, something asked CMAN to fence the node, and CMAN asked
Pacemaker to do it.

You'll have to figure out what requested it, and see whether there's a
way to disable fence requests in that app. DLM (used by clvmd and some
cluster filesystems) is a prime suspect, and I believe there's no way to
disable fencing inside it.

Of course, disabling fencing is a bad idea anyway :-)

> I'm looking into it for quite a while already, but to be honest - still
> dont understand this behavior...
> I would expect pacemaker not to try to reboot other node if stonith is
> disabled...
> Can anyone help to understand this behavior ? (and hopefully help to
> avoid those reboot attempts )
> 
> Many thanks in advance!
> 
> best regards
> daniel

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] (no subject)

2017-07-20 Thread Ken Gaillot
On 07/20/2017 12:21 AM, ArekW wrote:
> Hi, How to properly unset a value with pcs? Set to false or null gives error:
> 
> # pcs stonith update vbox-fencing verbose=false --force
> or
> # pcs stonith update vbox-fencing verbose= --force
> 
> Jul 20 07:14:11 nfsnode1 stonith-ng[11097]: warning: fence_vbox[3092]
> stderr: [ WARNING:root:Parse error: Ignoring option 'verbose' because
> it does not have value ]
> 
> To surpress the message I have to delete resource and recteate it
> without unwanted variable. I'am not sure if it concerns also other
> variables or it's just this one.

Pacemaker and pcs can handle empty values; in this case, it's the fence
agent itself that's generating the warning. The "stderr" in the log
means that the fence agent printed this in its standard error output,
and pacemaker is just passing that along in the logs.

The message doesn't cause any problems for pacemaker, and from the
message, it looks like it doesn't cause problems for the agent either,
so it's just an annoyance.

I don't see a reason it should print an error in this case, so it's
probably worthwhile to submit a bug report against fence_vbox to
silently treat empty options as defaults.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] why resources are restarted when a node rejoins a cluster?

2017-07-25 Thread Ken Gaillot
On Mon, 2017-07-24 at 23:07 -0400, Digimer wrote:
> On 2017-07-24 11:04 PM, ztj wrote:
> > Hi all,
> > I have 2 Centos nodes with heartbeat and pacemaker-1.1.13 installed,
> > and almost everything is working fine, I have only apache configured
> > for testing, when a node goes down the failover is done correctly,
> > but there's a problem when a node failbacks.
> > 
> > For example, let's say that Node1 has the lead on apache resource,
> > then I reboot Node1, so Pacemaker detect it goes down, then apache
> > is promoted to the Node2 and it keeps there running fine, that's
> > fine, but when Node1 recovers and joins the cluster again, apache is
> > restarted on Node2 again.
> > 
> > Anyone knows, why resources are restarted when a node rejoins a
> > cluster? thanks

That's not the default behavior, so something else is going on. Show
your configuration (with any sensitive information removed) for more
help.

> You sent this to the moderators, not the list.
> 
> Please don't use heartbeat, it is extremely deprecated. Please switch
> to corosync.

Since it's CentOS, it has to be corosync, unless heartbeat was compiled
locally.

> 
> To offer any other advice, you need to share your config and the logs
> from both nodes. Please respond to the list, not
> developers-ow...@clusterlabs.org.
> 
> digimer
> -- 
> Digimer
> Papers and Projects: https://alteeve.com/w/
> "I am, somehow, less interested in the weight and convolutions of Einstein’s 
> brain than in the near certainty that people of equal talent have lived and 
> died in cotton fields and sweatshops." - Stephen Jay Gould
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] epic fail

2017-07-24 Thread Ken Gaillot
On Mon, 2017-07-24 at 18:09 +0200, Valentin Vidic wrote:
> On Mon, Jul 24, 2017 at 11:01:26AM -0500, Dimitri Maziuk wrote:
> > Lsof/fuser show the PID of the process holding FS open as "kernel".
> 
> That could be the NFS server running in the kernel.

Dimitri,

Is the NFS server also managed by pacemaker? Is it ordered after DRBD?
Did pacemaker try to stop it before stopping DRBD?
-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] epic fail

2017-07-24 Thread Ken Gaillot
On Mon, 2017-07-24 at 17:13 +0200, Kristián Feldsam wrote:
> Hmm, so when you know, that it happens also when putting node standy,
> them why you run yum update on live cluster, it must be clear that
> node will be fenced.

Standby is not necessary, it's just a cautious step that allows the
admin to verify that all resources moved off correctly. The restart that
yum does should be sufficient for pacemaker to move everything.

A restart shouldn't lead to fencing in any case where something's not
going seriously wrong. I'm not familiar with the "kernel is using it"
message, I haven't run into that before.

The only case where special handling was needed before a yum update is a
node running pacemaker_remote instead of the full cluster stack, before
pacemaker 1.1.15.

> Would you post your pacemaker config? + some logs?
> 
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: supp...@feldhost.cz
> 
> www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové
> služby za adekvátní ceny.
> 
> FELDSAM s.r.o.
> V rohu 434/3
> Praha 4 – Libuš, PSČ 142 00
> IÄŚ: 290 60 958, DIÄŚ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
> 
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010  0024 0033 0446
> 
> > On 24 Jul 2017, at 17:04, Dimitri Maziuk 
> > wrote:
> > 
> > On 07/24/2017 09:40 AM, Jan PokornĂ˝ wrote:
> > 
> > > Would there be an interest, though?  And would that be meaningful?
> > 
> > IMO the only reason to put a node in standby is if you want to
> > reboot
> > the active node with no service interruption. For anything else,
> > including a reboot with service interruption (during maintenance
> > window), it's a no.
> > 
> > This is akin to "your mouse has moved, windows needs to be
> > restarted".
> > Except the mouse thing is a joke whereas those "standby" clowns
> > appear
> > to be serious.
> > 
> > With this particular failure, something in the Redhat patched kernel
> > (NFS?) does not release the DRBD filesystem. It happens when I put
> > the
> > node in standby as well, the only difference is not messing up the
> > RPM
> > database which isn't that hard to fix. Since I have several centos 6
> > +
> > DRBD + NFS + heartbeat R1 pairs running happily for years, I have to
> > conclude that centos 7 is simply the wrong tool for this particular
> > job.
> > 
> > -- 
> > Dimitri Maziuk
> > Programmer/sysadmin
> > BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Ken Gaillot
58, DIÄŚ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
> 
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010  0024 0033 0446 
> 
> 
>  
> 
> On 24 Jul 2017, at 17:27, Klaus Wenninger
> <kwenn...@redhat.com> wrote:
> 
> 
>  
> 
> On 07/24/2017 05:15 PM, Tomer Azran wrote:
> 
> 
> I still don't understand why the qdevice
> concept doesn't help on this situation. Since
> the master node is down, I would expect the
> quorum to declare it as dead.
> 
> 
> Why doesn't it happens?
> 
> 
> 
> 
> That is not how quorum works. It just limits the
> decision-making to the quorate subset of the cluster.
> Still the unknown nodes are not sure to be down.
> That is why I suggested to have quorum-based
> watchdog-fencing with sbd.
> That would assure that within a certain time all nodes
> of the non-quorate part
> of the cluster are down.
> 
> 
> 
> 
> 
> 
> 
> 
> On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri
> Maziuk" <dmitri.maz...@gmail.com> wrote:
> 
> On 2017-07-24 07:51, Tomer Azran wrote:
> > We don't have the ability to use it.
> > Is that the only solution?
>  
> No, but I'd recommend thinking about it first. Are you sure 
> you will 
> care about your cluster working when your server room is on 
> fire? 'Cause 
> unless you have halon suppression, your server room is a 
> complete 
> write-off anyway. (Think water from sprinklers hitting rich 
> chunky volts 
> in the servers.)
>  
> Dima
>  
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>  
> Project Home: http://www.clusterlabs.org
> Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>  
> Project Home: http://www.clusterlabs.org
> Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
>  
> 
> -- 
> Klaus Wenninger
>  
> Senior Software Engineer, EMEA ENG Openstack Infrastructure
>  
> Red Hat
>  
> kwenn...@redhat.com   
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting
> started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
>  
>     
>     
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>  
> Project Home: http://www.clusterlabs.org
> Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>   
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resources do not migrate although node is going to standby

2017-07-24 Thread Ken Gaillot
b_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
> Jul 21 18:03:53 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 51 
> (prim_vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
> 
> Do i understand it correctly that the port is occupied on the node it should 
> migrate to (ha-idg-1) ?

It looks like it

> But there is no vm running and i don't have a standalone vnc server 
> configured. Why is the port occupied ?

Can't help there

> Btw: the network sockets are live migrated too during a live migration of a 
> VirtualDomain resource ?
> It should be like that.
> 
> Thanks.
> 
> 
> Bernd

My memory is hazy, but I think TCP connections are migrated as long as
the migration is under the TCP timeout. I could be mis-remembering.
-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] timeout for stop VirtualDomain running Windows 7

2017-07-24 Thread Ken Gaillot
On Mon, 2017-07-24 at 19:30 +0200, Lentes, Bernd wrote:
> Hi,
> 
> i have a VirtualDomian resource running a Windows 7 client. This is the 
> respective configuration:
> 
> primitive prim_vm_servers_alive VirtualDomain \
> params config="/var/lib/libvirt/images/xml/Server_Monitoring.xml" \
> params hypervisor="qemu:///system" \
> params migration_transport=ssh \
> params autoset_utilization_cpu=false \
> params autoset_utilization_hv_memory=false \
> op start interval=0 timeout=120 \
> op stop interval=0 timeout=130 \
> op monitor interval=30 timeout=30 \
> op migrate_from interval=0 timeout=180 \
> op migrate_to interval=0 timeout=190 \
> meta allow-migrate=true target-role=Started is-managed=true
> 
> The timeout for the stop operation is 130 seconds. But our windows 7 clients, 
> as most do, install updates from time to time .
> And then a shutdown can take 10 or 20 minutes or even longer.
> If the timeout isn't as long as the installation of the updates takes then 
> the vm is forced off. With all possible negative consequences.
> But on the other hand i don't like to set a timeout of eg. 20 minutes, which 
> may still not be enough in some circumstances, but is much too long
> if the guest doesn't install updates.
> 
> Any ideas ?
> 
> Thanks.
> 
> 
> Bernd

If you can restrict updates to a certain time window, you can set up a
rule that uses a longer timeout during that window.

If you can't restrict the time window, but you can run a script when
updates are done, you could set a node attribute at that time (and clear
it on reboot), and use a similar rule based on the attribute.
-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] reboot node / cluster standby

2017-06-29 Thread Ken Gaillot
On 06/29/2017 04:42 AM, philipp.achmuel...@arz.at wrote:
> Hi,
> 
> In order to reboot a Clusternode i would like to set the node to standby
> first, so a clean takeover for running resources can take in place.
> Is there a default way i can set in pacemaker, or do i have to setup my
> own systemd implementation?
> 
> thank you!
> regards
> 
> env:
> Pacemaker 1.1.15
> SLES 12.2

If a node cleanly shuts down or reboots, pacemaker will move all
resources off it before it exits, so that should happen as you're
describing, without needing an explicit standby.

Explicitly doing standby first would be useful mainly if you want to
manually check the results of the takeover before proceeding with the
reboot, and/or if you want the node to come back in standby mode next
time it joins.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] reboot node / cluster standby

2017-06-29 Thread Ken Gaillot
On 06/29/2017 01:38 PM, Ludovic Vaugeois-Pepin wrote:
> On Thu, Jun 29, 2017 at 7:27 PM, Ken Gaillot <kgail...@redhat.com> wrote:
>> On 06/29/2017 04:42 AM, philipp.achmuel...@arz.at wrote:
>>> Hi,
>>>
>>> In order to reboot a Clusternode i would like to set the node to standby
>>> first, so a clean takeover for running resources can take in place.
>>> Is there a default way i can set in pacemaker, or do i have to setup my
>>> own systemd implementation?
>>>
>>> thank you!
>>> regards
>>> 
>>> env:
>>> Pacemaker 1.1.15
>>> SLES 12.2
>>
>> If a node cleanly shuts down or reboots, pacemaker will move all
>> resources off it before it exits, so that should happen as you're
>> describing, without needing an explicit standby.
> 
> This makes me wonder about timeouts. Specifically OS/systemd timeouts.
> Say the node being shut down or rebooted holds a resource as a master,
> and it takes a while for the demote to complete, say 100 seconds (less
> than the demote timeout of 120s in this hypothetical scenario).  Will
> the OS/systemd wait until pacemaker exits cleanly on a regular CentOS
> or Debian?

Yes. The pacemaker systemd unit file uses TimeoutStopSec=30min.

> 
> 
>> Explicitly doing standby first would be useful mainly if you want to
>> manually check the results of the takeover before proceeding with the
>> reboot, and/or if you want the node to come back in standby mode next
>> time it joins.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Coming in Pacemaker 1.1.17: container bundles

2017-06-30 Thread Ken Gaillot
On 06/30/2017 12:10 PM, Valentin Vidic wrote:
> On Fri, Mar 31, 2017 at 05:43:02PM -0500, Ken Gaillot wrote:
>> Here's an example of the CIB XML syntax (higher-level tools will likely
>> provide a more convenient interface):
>>
>>  
>>
>>   
> 
> Would it be possible to make this a bit more generic like:
> 
>   
> 
> so we have support for other container engines like rkt?

The challenge is that some properties are docker-specific and other
container engines will have their own specific properties.

We decided to go with a tag for each supported engine -- so if we add
support for rkt, we'll add a  tag with whatever properties it
needs. Then a  would need to contain either a  tag or a
 tag.

We did consider a generic alternative like:

  
 
 
 ...
 
 ...
  

But it was decided that using engine-specific tags would allow for
schema enforcement, and would be more readable.

The  and  tags were kept under  because we
figured those are essential to the concept of a bundle, and any engine
should support some way of mapping those.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Question about STONITH for VM HA cluster in shared hosts environment

2017-06-29 Thread Ken Gaillot
On 06/29/2017 12:08 PM, Digimer wrote:
> On 29/06/17 12:39 PM, Andrés Pozo Muñoz wrote:
>> Hi all,
>>
>> I am a newbie to Pacemaker and I can't find the perfect solution for my
>> problem (probably I'm missing something), maybe someone can give me some
>> hint :)
>>
>> My scenario is the following: I want to make a HA cluster composed of 2
>> virtual machines running on top of SHARED virtualization hosts. That is,
>> I have a bunch of hosts running multiple VMs, and I would like to create
>> an HA cluster with 2 VMs (Ubuntus running some app) running in
>> (different) hosts.
>>
>> About the resources, I have no problem, I'll configure some VIP and some
>> lsb services in a group.
>>
>> My concern is about the STONITH, I can't find the perfect config:
>> * If I disable STONITH I may have the split brain problem.
>> * If I enable STONITH with external/libvirt fence, I'll have a
>> single point of failure  if the host with the Active VM dies, right?
>>  (Imagine the host running that active VMs dies, the STONITH operation
>> from the other VM will fail and the switch-over will not happen, right?)
>> * I can't use a 'hardware' (ILO/DRAC) fence, because the host is
>> running a lot of VMs, not only the ones in HA cluster :( I can't reboot
>> it because of some failure in our HA.
>>
>> Is there an optimal configuration for such scenario?
>>
>> I think I'd rather live with the split brain problem, but I just want to
>> know if I missed any config option.
>>
>> Thanks in advance!
>>
>> Cheers,
>> Andrés
> 
> You've realized why a production cluster on VMs is generally not
> recommended. :)
> 
> If the project is important enough to make HA, then management needs to
> allocate the budget to get the proper hardware for the effort, I would
> argue. If you want to keep the services in VMs, that's fine, get a pair
> of nodes and make them an HA cluster to protect the VMs as the services
> (we do this all the time).
> 
> With that, then you pair IPMI and switched PDUs for complete coverage
> (IPMI alone isn't enough, because if the host is destroyed, it will take
> the IPMI BMC with it).

To elaborate on this approach, the underlying hosts could be the cluster
nodes, and the VMs could be resources. If you make all the VMs into
resources, then you get HA for all of them. You can also run Pacemaker
Remote in any of the VMs if you want to monitor resources running inside
them (or move resources from one VM to another).

Commenting on your original question, I'd point out that if pacemaker
chooses to fence one of the underlying hosts, it's not responding
normally, so any other VMs on it are likely toast anyway. You may
already be familiar, but you can set a fencing topology so that
pacemaker tries libvirt first, then kills the host only if that fails.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Coming in Pacemaker 1.1.17: container bundles

2017-07-03 Thread Ken Gaillot
On 07/01/2017 06:47 AM, Valentin Vidic wrote:
> On Fri, Jun 30, 2017 at 12:46:29PM -0500, Ken Gaillot wrote:
>> The challenge is that some properties are docker-specific and other
>> container engines will have their own specific properties.
>>
>> We decided to go with a tag for each supported engine -- so if we add
>> support for rkt, we'll add a  tag with whatever properties it
>> needs. Then a  would need to contain either a  tag or a
>>  tag.
>>
>> We did consider a generic alternative like:
>>
>>   
>>  
>>  
>>  ...
>>  
>>  ...
>>   
>>
>> But it was decided that using engine-specific tags would allow for
>> schema enforcement, and would be more readable.
>>
>> The  and  tags were kept under  because we
>> figured those are essential to the concept of a bundle, and any engine
>> should support some way of mapping those.
> 
> Thanks for the explanation, it makes sense :)
> 
> Now I have a working rkt resource agent and would like to test it.
> Can you share the pcmk:httpd image mentioned in the docker example?

Sure, we have a walk-through on the wiki that I was going to announce
after 1.1.17 final is released (hopefully later this week), but now is
good, too :-)

   https://wiki.clusterlabs.org/wiki/Bundle_Walk-Through

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Problem with stonith and starting services

2017-07-03 Thread Ken Gaillot
On 07/03/2017 02:34 AM, Cesar Hernandez wrote:
> Hi
> 
> I have installed a pacemaker cluster with two nodes. The same type of 
> installation has done before many times and the following error never 
> appeared before. The situation is the following:
> 
> both nodes running cluster services
> stop pacemaker on node 1
> stop pacemaker on node 2
> start corosync on node 1
> 
> Then node 1 starts, it sees node2 down, and it fences it, as it was expected. 
> But the problem comes when node 2 is rebooted and starts cluster services: 
> sometimes, it starts the corosync service but the pacemaker service starts 
> and then stops. The syslog shows the following error in these cases:
> 
> Jul  3 09:07:04 node2 pacemakerd[597]:  warning: The crmd process (608) can 
> no longer be respawned, shutting the cluster down.
> Jul  3 09:07:04 node2 pacemakerd[597]:   notice: Shutting down Pacemaker
> 
> Previous messages show some warning messages that I'm not sure they are 
> related with the shutdown:
> 
> 
> Jul  3 09:07:04 node2 stonith-ng[604]:   notice: Operation reboot of node2 by 
> node1 for crmd.2413@node1.608d8118: OK
> Jul  3 09:07:04 node2 crmd[608]: crit: We were allegedly just fenced by 
> node1 for node1!
> Jul  3 09:07:04 node2 corosync[585]:   [pcmk  ] info: pcmk_ipc_exit: Client 
> crmd (conn=0x1471800, async-conn=0x1471800) left
> 
> 
> On node1, all resources become unrunnable and it stays there forever until I 
> start manually pacemaker service on node2. 
> As I said, same type of installation has done before on other servers and 
> never happened this. The only difference is that in previous installations I 
> configured corosync with multicast and now I have configured with unicast (my 
> current network environment doesn't allow multicast) but I think it's not 
> related with that behaviour

Agreed, I don't think it's multicast vs unicast.

I can't see from this what's going wrong. Possibly node1 is trying to
re-fence node2 when it comes back. Check that the fencing resources are
configured correctly, and check whether node1 sees the first fencing
succeed.

> Cluster software versions:
> corosync-1.4.8
> crmsh-2.1.5
> libqb-0.17.2
> Pacemaker-1.1.14
> resource-agents-3.9.6
> 
> 
> 
> Can you help me?
> 
> Thanks
> 
> Cesar

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Problem with stonith and starting services

2017-07-06 Thread Ken Gaillot
On 07/06/2017 08:54 AM, Cesar Hernandez wrote:
> 
>>
>> So, the above log means that node1 decided that node2 needed to be
>> fenced, requested fencing of node2, and received a successful result for
>> the fencing, and yet node2 was not killed.
>>
>> Your fence agent should not return success until node2 has verifiably
>> been stopped. If there is some way to query the AWS API whether node2 is
>> running or not, that would be sufficient (merely checking that the node
>> is not responding to some command such as ping is not sufficient).
> 
> Thanks. But node2 has always been successfully fenced... so this is not the 
> problem

If node2 is getting the notification of its own fencing, it wasn't
successfully fenced. Successful fencing would render it incapacitated
(powered down, or at least cut off from the network and any shared
resources).

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] About Corosync up to 16 nodes limit

2017-07-06 Thread Ken Gaillot
On 07/06/2017 03:51 AM, mlb_1 wrote:
> thanks for your solution.
> 
> Is anybody can officially reply this topic ?

Digimer is correct, the Red Hat and SuSE limits are their own chosen
limits for technical support, not enforced by the code. There are no
hard limits in the code, but practically speaking, it is very difficult
to go beyond 32 corosync nodes.

Pacemaker Remote is the currently recommended way to scale a cluster
larger. With Pacemaker Remote, a small number of nodes run the full
cluster stack including corosync and all pacemaker daemons, while the
Pacemaker Remote nodes run only a single pacemaker daemon (the local
resource manager). This allows the number of nodes to scale much higher.

As I understand it, the corosync limit is mainly a function of needing
to pass the token around to all nodes in a small amount of time, to
guarantee that each message has been received by every node, and in
order. Therefore the speed and reliability of the network, the nodes'
network interfaces, and the nodes' ability to process network traffic
are the main bottlenecks to larger clusters.

In Pacemaker, bottlenecks I'm aware of are the size of the CIB (which
must be frequently passed between nodes over the network, and compressed
if it is large), the time it takes the policy engine to calculate
necessary actions in a complex cluster (lots of nodes, resources, and
constraints), the time it takes to complete a DC election when a node
leaves or rejoins the cluster, and to a lesser extent some daemon
communication that is less efficient than it could be due to the need to
support rolling upgrades from older versions.

Scalability is a major area of interest for future corosync and
pacemaker development.

> At 2017-07-06 11:45:05, "Digimer"  wrote:
>>I'm not employed by Red Hat, so I can't speak authoritatively.
>>
>>My understanding, however, is that they do not distinguish as corosync
>>on its own doesn't do much. The complexity comes from corosync traffic
>>though, but it gets more of a concern when you add in pacemaker traffic
>>and/or the CIB grows large.
>>
>>Again, there is no hard code limit here, just what is practical. Can I
>>ask how large of a cluster you are planning to build, and what it will
>>be used for?
>>
>>Note also; This is not related to pacemaker remote. You can have very
>>large counts of remote nodes.
>>
>>digimer
>>
>>On 2017-07-05 11:27 PM, mlb_1 wrote:
>>> Is RedHat limit node's number, or corosync's code?
>>> 
>>> 
>>> At 2017-07-06 11:11:39, "Digimer"  wrote:
On 2017-07-05 09:03 PM, mlb_1 wrote:
> Hi:
>   I heard corosync-node's number limit to 16? It's true? And Why?
>  Thanks for anyone's answer.
> 
>
>  
> https://specs.openstack.org/openstack/fuel-specs/specs/6.0/pacemaker-improvements.html
>  
> 
> 
>   * Corosync 2.0 has a lot of improvements that allow to have up to 100
> Controllers. Corosync 1.0 scales up to 10-16 node

There is no hard limit on how many nodes can be in a cluster, but Red
Hat supports up to 16. SUSE supports up to 32, iirc. The problem is that
it gets harder and harder to keep things stable as the number of nodes
grow. There is a lot of coordination that has to happen between the
nodes and it gets ever more complex.

Generally speaking, you don't want large clusters. It is always advised
to break things up it separate smaller clusters whenever possible.
>>
>>
>>-- 
>>Digimer
>>Papers and Projects: https://alteeve.com/w/
>>"I am, somehow, less interested in the weight and convolutions of
>>Einstein’s brain than in the near certainty that people of equal talent
>>have lived and died in cotton fields and sweatshops." - Stephen Jay Gould

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fwd: Cluster - NFS Share Configuration

2017-07-06 Thread Ken Gaillot
On 07/06/2017 07:24 AM, pradeep s wrote:
> Team,
> 
> I am working on configuring cluster environment for NFS share using
> pacemaker. Below are the resources I have configured.
> 
> Quote:
> Group: nfsgroup
> Resource: my_lvm (class=ocf provider=heartbeat type=LVM)
> Attributes: volgrpname=my_vg exclusive=true
> Operations: start interval=0s timeout=30 (my_lvm-start-interval-0s)
> stop interval=0s timeout=30 (my_lvm-stop-interval-0s)
> monitor interval=10 timeout=30 (my_lvm-monitor-interval-10)
> Resource: nfsshare (class=ocf provider=heartbeat type=Filesystem)
> Attributes: device=/dev/my_vg/my_lv directory=/nfsshare fstype=ext4
> Operations: start interval=0s timeout=60 (nfsshare-start-interval-0s)
> stop interval=0s timeout=60 (nfsshare-stop-interval-0s)
> monitor interval=20 timeout=40 (nfsshare-monitor-interval-20)
> Resource: nfs-daemon (class=ocf provider=heartbeat type=nfsserver)
> Attributes: nfs_shared_infodir=/nfsshare/nfsinfo nfs_no_notify=true
> Operations: start interval=0s timeout=40 (nfs-daemon-start-interval-0s)
> stop interval=0s timeout=20s (nfs-daemon-stop-interval-0s)
> monitor interval=10 timeout=20s (nfs-daemon-monitor-interval-10)
> Resource: nfs-root (class=ocf provider=heartbeat type=exportfs)
> Attributes: clientspec=10.199.1.0/255.255.255.0
>  options=rw,sync,no_root_squash
> directory=/nfsshare/exports fsid=0
> Operations: start interval=0s timeout=40 (nfs-root-start-interval-0s)
> stop interval=0s timeout=120 (nfs-root-stop-interval-0s)
> monitor interval=10 timeout=20 (nfs-root-monitor-interval-10)
> Resource: nfs-export1 (class=ocf provider=heartbeat type=exportfs)
> Attributes: clientspec=10.199.1.0/255.255.255.0
>  options=rw,sync,no_root_squash
> directory=/nfsshare/exports/export1 fsid=1
> Operations: start interval=0s timeout=40 (nfs-export1-start-interval-0s)
> stop interval=0s timeout=120 (nfs-export1-stop-interval-0s)
> monitor interval=10 timeout=20 (nfs-export1-monitor-interval-10)
> Resource: nfs-export2 (class=ocf provider=heartbeat type=exportfs)
> Attributes: clientspec=10.199.1.0/255.255.255.0
>  options=rw,sync,no_root_squash
> directory=/nfsshare/exports/export2 fsid=2
> Operations: start interval=0s timeout=40 (nfs-export2-start-interval-0s)
> stop interval=0s timeout=120 (nfs-export2-stop-interval-0s)
> monitor interval=10 timeout=20 (nfs-export2-monitor-interval-10)
> Resource: nfs_ip (class=ocf provider=heartbeat type=IPaddr2)
> Attributes: ip=10.199.1.86 cidr_netmask=24
> Operations: start interval=0s timeout=20s (nfs_ip-start-interval-0s)
> stop interval=0s timeout=20s (nfs_ip-stop-interval-0s)
> monitor interval=10s timeout=20s (nfs_ip-monitor-interval-10s)
> Resource: nfs-notify (class=ocf provider=heartbeat type=nfsnotify)
> Attributes: source_host=10.199.1.86
> Operations: start interval=0s timeout=90 (nfs-notify-start-interval-0s)
> stop interval=0s timeout=90 (nfs-notify-stop-interval-0s)
> monitor interval=30 timeout=90 (nfs-notify-monitor-interval-30)
> 
> 
> PCS Status
> Quote:
> Cluster name: my_cluster
> Stack: corosync
> Current DC: node3.cluster.com  (version
> 1.1.15-11.el7_3.5-e174ec8) - partition with quorum
> Last updated: Wed Jul 5 13:12:48 2017 Last change: Wed Jul 5 13:11:52
> 2017 by root via crm_attribute on node3.cluster.com
> 
> 
> 2 nodes and 10 resources configured
> 
> Online: [ node3.cluster.com  node4.cluster.com
>  ]
> 
> Full list of resources:
> 
> fence-3 (stonith:fence_vmware_soap): Started node4.cluster.com
> 
> fence-4 (stonith:fence_vmware_soap): Started node3.cluster.com
> 
> Resource Group: nfsgroup
> my_lvm (ocf::heartbeat:LVM): Started node3.cluster.com
> 
> nfsshare (ocf::heartbeat:Filesystem): Started node3.cluster.com
> 
> nfs-daemon (ocf::heartbeat:nfsserver): Started node3.cluster.com
> 
> nfs-root (ocf::heartbeat:exportfs): Started node3.cluster.com
> 
> nfs-export1 (ocf::heartbeat:exportfs): Started node3.cluster.com
> 
> nfs-export2 (ocf::heartbeat:exportfs): Started node3.cluster.com
> 
> nfs_ip (ocf::heartbeat:IPaddr2): Started node3.cluster.com
> 
> nfs-notify (ocf::heartbeat:nfsnotify): Started node3.cluster.com
> 
> 
> I followedthe redhat link
> 
> to configure.
> 
> Once configured, I could mount the directory from nfs client with no
> issues. However, wehn I try entering the node to standby, the resources
> not starting up in secondary node.
> 
> After entering active node to 

Re: [ClusterLabs] Antw: Re: reboot node / cluster standby

2017-07-06 Thread Ken Gaillot
On 07/06/2017 02:21 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 29.06.2017 um 21:15 in 
>>>> Nachricht
> <44ee8b24-fe14-a204-f791-248546c2f...@redhat.com>:
>> On 06/29/2017 01:38 PM, Ludovic Vaugeois-Pepin wrote:
>>> On Thu, Jun 29, 2017 at 7:27 PM, Ken Gaillot <kgail...@redhat.com> wrote:
>>>> On 06/29/2017 04:42 AM, philipp.achmuel...@arz.at wrote:
>>>>> Hi,
>>>>>
>>>>> In order to reboot a Clusternode i would like to set the node to standby
>>>>> first, so a clean takeover for running resources can take in place.
>>>>> Is there a default way i can set in pacemaker, or do i have to setup my
>>>>> own systemd implementation?
>>>>>
>>>>> thank you!
>>>>> regards
>>>>> 
>>>>> env:
>>>>> Pacemaker 1.1.15
>>>>> SLES 12.2
>>>>
>>>> If a node cleanly shuts down or reboots, pacemaker will move all
>>>> resources off it before it exits, so that should happen as you're
>>>> describing, without needing an explicit standby.
>>>
>>> This makes me wonder about timeouts. Specifically OS/systemd timeouts.
>>> Say the node being shut down or rebooted holds a resource as a master,
>>> and it takes a while for the demote to complete, say 100 seconds (less
>>> than the demote timeout of 120s in this hypothetical scenario).  Will
>>> the OS/systemd wait until pacemaker exits cleanly on a regular CentOS
>>> or Debian?
>>
>> Yes. The pacemaker systemd unit file uses TimeoutStopSec=30min.
> 
> From crm ra info ocf:heartbeatSAPDatabase:
> Operations' defaults (advisory minimum):
> 
> start timeout=1800
> stop  timeout=1800
> statustimeout=60
> monitor   timeout=60 interval=120
> methods   timeout=5
> 
> 
> ;-)
> 
> So your score may vary. The RA probably won't take that long, but we have VMs 
> that need > 6 minutes to shut down. If you shut down 10 such VMs 
> sequentially, you need to be patient (at least)...

Yes, good point -- 30 minutes is just a "good enough for most users"
default value. If someone has unusual requirements, they need to create
a systemd drop-in with a higher TimeoutStopSec.

>>>> Explicitly doing standby first would be useful mainly if you want to
>>>> manually check the results of the takeover before proceeding with the
>>>> reboot, and/or if you want the node to come back in standby mode next
>>>> time it joins.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Problem with stonith and starting services

2017-07-06 Thread Ken Gaillot
On 07/04/2017 08:28 AM, Cesar Hernandez wrote:
> 
>>
>> Agreed, I don't think it's multicast vs unicast.
>>
>> I can't see from this what's going wrong. Possibly node1 is trying to
>> re-fence node2 when it comes back. Check that the fencing resources are
>> configured correctly, and check whether node1 sees the first fencing
>> succeed.
> 
> 
> Thanks. Checked fencing resource and it always returns, it's a custom script 
> I used on other installations and it always worked.
> I think the clue are the two messages that appear when it fails:
> 
> Jul  3 09:07:04 node2 pacemakerd[597]:  warning: The crmd process (608) can 
> no longer be respawned, shutting the cluster down.
> Jul  3 09:07:04 node2 crmd[608]: crit: We were allegedly just fenced by 
> node1 for node1!
> 
> Anyone knows what are they related to? Seems not to be much information on 
> the Internet
> 
> Thanks
> Cesar

"We were allegedly just fenced" means that the node just received a
notification from stonithd that another node successfully fenced it.
Clearly, this is a problem, because a node that is truly fenced should
be unable to receive any communications from the cluster. As such, the
cluster services immediately exit and stay down.

So, the above log means that node1 decided that node2 needed to be
fenced, requested fencing of node2, and received a successful result for
the fencing, and yet node2 was not killed.

Your fence agent should not return success until node2 has verifiably
been stopped. If there is some way to query the AWS API whether node2 is
running or not, that would be sufficient (merely checking that the node
is not responding to some command such as ping is not sufficient).

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_vbox Unable to connect/login to fencing device

2017-07-06 Thread Ken Gaillot
On 07/06/2017 10:13 AM, ArekW wrote:
> Hi,
> 
> It seems that my the fence_vbox is running but there are errors in
> logs every few minutes like:
> 
> Jul  6 12:51:12 nfsnode1 fence_vbox: Unable to connect/login to fencing device
> Jul  6 12:51:13 nfsnode1 stonith-ng[7899]: warning: fence_vbox[30220]
> stderr: [ Unable to connect/login to fencing device ]
> Jul  6 12:51:13 nfsnode1 stonith-ng[7899]: warning: fence_vbox[30220]
> stderr: [  ]
> Jul  6 12:51:13 nfsnode1 stonith-ng[7899]: warning: fence_vbox[30220]
> stderr: [  ]
> 
> Eventually after fome time the pcs status shows Failed Actions:
> 
> # pcs status --full
> Cluster name: nfscluster
> Stack: corosync
> Current DC: nfsnode1 (1) (version 1.1.15-11.el7_3.5-e174ec8) -
> partition with quorum
> Last updated: Thu Jul  6 13:02:52 2017  Last change: Thu Jul
> 6 13:00:33 2017 by root via crm_resource on nfsnode1
> 
> 2 nodes and 11 resources configured
> 
> Online: [ nfsnode1 (1) nfsnode2 (2) ]
> 
> Full list of resources:
> 
> Master/Slave Set: StorageClone [Storage]
>  Storage(ocf::linbit:drbd): Master nfsnode1
>  Storage(ocf::linbit:drbd): Master nfsnode2
>  Masters: [ nfsnode1 nfsnode2 ]
> Clone Set: dlm-clone [dlm]
>  dlm(ocf::pacemaker:controld):  Started nfsnode1
>  dlm(ocf::pacemaker:controld):  Started nfsnode2
>  Started: [ nfsnode1 nfsnode2 ]
> vbox-fencing   (stonith:fence_vbox):   Started nfsnode1
> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>  ClusterIP:0(ocf::heartbeat:IPaddr2):   Started nfsnode1
>  ClusterIP:1(ocf::heartbeat:IPaddr2):   Started nfsnode2
> Clone Set: StorageFS-clone [StorageFS]
>  StorageFS  (ocf::heartbeat:Filesystem):Started nfsnode1
>  StorageFS  (ocf::heartbeat:Filesystem):Started nfsnode2
>  Started: [ nfsnode1 nfsnode2 ]
> Clone Set: WebSite-clone [WebSite]
>  WebSite(ocf::heartbeat:apache):Started nfsnode1
>  WebSite(ocf::heartbeat:apache):Started nfsnode2
>  Started: [ nfsnode1 nfsnode2 ]
> 
> Failed Actions:
> * vbox-fencing_start_0 on nfsnode1 'unknown error' (1): call=157,
> status=Error, exitreason='none',
> last-rc-change='Thu Jul  6 13:58:04 2017', queued=0ms, exec=11947ms
> * vbox-fencing_start_0 on nfsnode2 'unknown error' (1): call=57,
> status=Error, exitreason='none',
> last-rc-change='Thu Jul  6 13:58:16 2017', queued=0ms, exec=11953ms
> 
> The fence was created with command:
> pcs -f stonith_cfg stonith create vbox-fencing fence_vbox ip=10.0.2.2
> ipaddr=10.0.2.2 login=AW23321 username=AW23321
> identity_file=/root/.ssh/id_rsa host_os=windows
> pcmk_host_check=static-list pcmk_host_list="centos1 centos2"
> vboxmanage_path="/cygdrive/c/Program\
> Files/Oracle/VirtualBox/VBoxManage" op monitor interval=5
> 
> where centos1 and centos2 are VBox machines names (not hostnames). I
> used duplicated login/username parameters as it is indicated as
> required in stonith description fence_vbox.
> 
> Then I updated the configuration and set:
> 
> pcs stonith update vbox-fencing  pcmk_host_list="nfsnode1 nfsnode2"
> pcs stonith update vbox-fencing
> pcmk_host_map="nfsnode1:centos1;nfsnode2:centos2"
> 
> where nfsnode1 and nfsnode2 are the hostnames
> 
> I'not sure which config is correct but both shows Failed Actions after
> some time.

You only need one of pcmk_host_list or pcmk_host_map. Use pcmk_host_list
if fence_vbox recognizes the node names used by the cluster, or
pcmk_host_map if fence_vbox knows the nodes by other names. In this
case, it looks like you want to tell fence_vbox to use "centos2" when
the cluster wants to fence nfsnode2, so your pcmk_host_map is the right
choice.

> I've successfully tested the fence connection to the VBox host with:
> fence_vbox --ip 10.0.2.2 --username=AW23321
> --identity-file=/root/.ssh/id_rsa --plug=centos2 --host-os=windows
> --action=status --vboxmanage-path="/cygdrive/c/Program\
> Files/Oracle/VirtualBox/VBoxManage"
> 
> Why the above configuration work as standalone command and does not
> work in pcs ?
Two main possibilities: you haven't expressed those identical options in
the cluster configuration correctly; or, you have some permissions on
the command line that the cluster doesn't have (maybe SELinux, or file
permissions, or ...).

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_vbox Unable to connect/login to fencing device

2017-07-06 Thread Ken Gaillot
On 07/06/2017 10:29 AM, Ken Gaillot wrote:
> On 07/06/2017 10:13 AM, ArekW wrote:
>> Hi,
>>
>> It seems that my the fence_vbox is running but there are errors in
>> logs every few minutes like:
>>
>> Jul  6 12:51:12 nfsnode1 fence_vbox: Unable to connect/login to fencing 
>> device
>> Jul  6 12:51:13 nfsnode1 stonith-ng[7899]: warning: fence_vbox[30220]
>> stderr: [ Unable to connect/login to fencing device ]
>> Jul  6 12:51:13 nfsnode1 stonith-ng[7899]: warning: fence_vbox[30220]
>> stderr: [  ]
>> Jul  6 12:51:13 nfsnode1 stonith-ng[7899]: warning: fence_vbox[30220]
>> stderr: [  ]
>>
>> Eventually after fome time the pcs status shows Failed Actions:
>>
>> # pcs status --full
>> Cluster name: nfscluster
>> Stack: corosync
>> Current DC: nfsnode1 (1) (version 1.1.15-11.el7_3.5-e174ec8) -
>> partition with quorum
>> Last updated: Thu Jul  6 13:02:52 2017  Last change: Thu Jul
>> 6 13:00:33 2017 by root via crm_resource on nfsnode1
>>
>> 2 nodes and 11 resources configured
>>
>> Online: [ nfsnode1 (1) nfsnode2 (2) ]
>>
>> Full list of resources:
>>
>> Master/Slave Set: StorageClone [Storage]
>>  Storage(ocf::linbit:drbd): Master nfsnode1
>>  Storage(ocf::linbit:drbd): Master nfsnode2
>>  Masters: [ nfsnode1 nfsnode2 ]
>> Clone Set: dlm-clone [dlm]
>>  dlm(ocf::pacemaker:controld):  Started nfsnode1
>>  dlm(ocf::pacemaker:controld):  Started nfsnode2
>>  Started: [ nfsnode1 nfsnode2 ]
>> vbox-fencing   (stonith:fence_vbox):   Started nfsnode1
>> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>  ClusterIP:0(ocf::heartbeat:IPaddr2):   Started nfsnode1
>>  ClusterIP:1(ocf::heartbeat:IPaddr2):   Started nfsnode2
>> Clone Set: StorageFS-clone [StorageFS]
>>  StorageFS  (ocf::heartbeat:Filesystem):Started nfsnode1
>>  StorageFS  (ocf::heartbeat:Filesystem):Started nfsnode2
>>  Started: [ nfsnode1 nfsnode2 ]
>> Clone Set: WebSite-clone [WebSite]
>>  WebSite(ocf::heartbeat:apache):Started nfsnode1
>>  WebSite(ocf::heartbeat:apache):Started nfsnode2
>>  Started: [ nfsnode1 nfsnode2 ]
>>
>> Failed Actions:
>> * vbox-fencing_start_0 on nfsnode1 'unknown error' (1): call=157,
>> status=Error, exitreason='none',
>> last-rc-change='Thu Jul  6 13:58:04 2017', queued=0ms, exec=11947ms
>> * vbox-fencing_start_0 on nfsnode2 'unknown error' (1): call=57,
>> status=Error, exitreason='none',
>> last-rc-change='Thu Jul  6 13:58:16 2017', queued=0ms, exec=11953ms
>>
>> The fence was created with command:
>> pcs -f stonith_cfg stonith create vbox-fencing fence_vbox ip=10.0.2.2
>> ipaddr=10.0.2.2 login=AW23321 username=AW23321
>> identity_file=/root/.ssh/id_rsa host_os=windows
>> pcmk_host_check=static-list pcmk_host_list="centos1 centos2"
>> vboxmanage_path="/cygdrive/c/Program\
>> Files/Oracle/VirtualBox/VBoxManage" op monitor interval=5
>>
>> where centos1 and centos2 are VBox machines names (not hostnames). I
>> used duplicated login/username parameters as it is indicated as
>> required in stonith description fence_vbox.
>>
>> Then I updated the configuration and set:
>>
>> pcs stonith update vbox-fencing  pcmk_host_list="nfsnode1 nfsnode2"
>> pcs stonith update vbox-fencing
>> pcmk_host_map="nfsnode1:centos1;nfsnode2:centos2"
>>
>> where nfsnode1 and nfsnode2 are the hostnames
>>
>> I'not sure which config is correct but both shows Failed Actions after
>> some time.
> 
> You only need one of pcmk_host_list or pcmk_host_map. Use pcmk_host_list
> if fence_vbox recognizes the node names used by the cluster, or
> pcmk_host_map if fence_vbox knows the nodes by other names. In this
> case, it looks like you want to tell fence_vbox to use "centos2" when
> the cluster wants to fence nfsnode2, so your pcmk_host_map is the right
> choice.
> 
>> I've successfully tested the fence connection to the VBox host with:
>> fence_vbox --ip 10.0.2.2 --username=AW23321
>> --identity-file=/root/.ssh/id_rsa --plug=centos2 --host-os=windows
>> --action=status --vboxmanage-path="/cygdrive/c/Program\
>> Files/Oracle/VirtualBox/VBoxManage"
>>
>> Why the above configuration work as standalone command and does not
>> work in pcs ?
> Two main possibilities: you haven't expressed those identical options in
> the cluster configuration correctly; or, you have some permissions on
> the command line that the cluster doesn't have (maybe SELinux, or file
> permissions, or ...).

Forgot one other possibility: the status shows that the *start* action
is what failed, not a fence action. Check the fence_vbox source code to
see what start does, and try to do that manually step by step.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Problem with stonith and starting services

2017-07-06 Thread Ken Gaillot
On 07/06/2017 09:26 AM, Klaus Wenninger wrote:
> On 07/06/2017 04:20 PM, Cesar Hernandez wrote:
>>> If node2 is getting the notification of its own fencing, it wasn't
>>> successfully fenced. Successful fencing would render it incapacitated
>>> (powered down, or at least cut off from the network and any shared
>>> resources).
>>
>> Maybe I don't understand you, or maybe you don't understand me... ;)
>> This is the syslog of the machine, where you can see that the machine has 
>> rebooted successfully, and as I said, it has been rebooted successfully all 
>> the times:
> 
> It is not just a question if it was rebooted at all.
> Your fence-agent mustn't return positively until this definitely
> has happened and the node is down.
> Otherwise you will see that message and the node will try to
> somehow cope with the fact that obviously the rest of the
> cluster thinks that it is down already.

But the "allegedly fenced" message comes in after the node has rebooted,
so it would seem that everything was in the proper sequence.

It looks like a bug when the fenced node rejoins quickly enough that it
is a member again before its fencing confirmation has been sent. I know
there have been plenty of clusters with nodes that quickly reboot and
slow fencing devices, so that seems unlikely, but I don't see another
explanation.

>> Jul  5 10:41:54 node2 kernel: [0.00] Initializing cgroup subsys 
>> cpuset
>> Jul  5 10:41:54 node2 kernel: [0.00] Initializing cgroup subsys cpu
>> Jul  5 10:41:54 node2 kernel: [0.00] Initializing cgroup subsys 
>> cpuacct
>> Jul  5 10:41:54 node2 kernel: [0.00] Linux version 3.16.0-4-amd64 
>> (debian-ker...@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 
>> SMP Debian 3.16.39-1 (2016-12-30)
>> Jul  5 10:41:54 node2 kernel: [0.00] Command line: 
>> BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64 
>> root=UUID=711e1ec2-2a36-4405-bf46-44b43cfee42e ro init=/bin/systemd 
>> console=ttyS0 console=hvc0
>> Jul  5 10:41:54 node2 kernel: [0.00] e820: BIOS-provided physical 
>> RAM map:
>> Jul  5 10:41:54 node2 kernel: [0.00] BIOS-e820: [mem 
>> 0x-0x0009dfff] usable
>> Jul  5 10:41:54 node2 kernel: [0.00] BIOS-e820: [mem 
>> 0x0009e000-0x0009] reserved
>> Jul  5 10:41:54 node2 kernel: [0.00] BIOS-e820: [mem 
>> 0x000e-0x000f] reserved
>> Jul  5 10:41:54 node2 kernel: [0.00] BIOS-e820: [mem 
>> 0x0010-0x3fff] usable
>> Jul  5 10:41:54 node2 kernel: [0.00] BIOS-e820: [mem 
>> 0xfc00-0x] reserved
>> Jul  5 10:41:54 node2 kernel: [0.00] NX (Execute Disable) 
>> protection: active
>> Jul  5 10:41:54 node2 kernel: [0.00] SMBIOS 2.4 present.
>>
>> ...
>>
>> Jul  5 10:41:54 node2 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 
>> 67
>>
>> ...
>>
>> Jul  5 10:41:54 node2 corosync[585]:   [MAIN  ] Corosync Cluster Engine 
>> ('UNKNOWN'): started and ready to provide service.
>> Jul  5 10:41:54 node2 corosync[585]:   [MAIN  ] Corosync built-in features: 
>> nss
>> Jul  5 10:41:54 node2 corosync[585]:   [MAIN  ] Successfully read main 
>> configuration file '/etc/corosync/corosync.conf'.
>>
>> ...
>>
>> Jul  5 10:41:57 node2 crmd[608]:   notice: Defaulting to uname -n for the 
>> local classic openais (with plugin) node name
>> Jul  5 10:41:57 node2 crmd[608]:   notice: Membership 4308: quorum acquired
>> Jul  5 10:41:57 node2 crmd[608]:   notice: plugin_handle_membership: Node 
>> node2[1108352940] - state is now member (was (null))
>> Jul  5 10:41:57 node2 crmd[608]:   notice: plugin_handle_membership: Node 
>> node11[794540] - state is now member (was (null))
>> Jul  5 10:41:57 node2 crmd[608]:   notice: The local CRM is operational
>> Jul  5 10:41:57 node2 crmd[608]:   notice: State transition S_STARTING -> 
>> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
>> Jul  5 10:41:57 node2 stonith-ng[604]:   notice: Watching for stonith 
>> topology changes
>> Jul  5 10:41:57 node2 stonith-ng[604]:   notice: Membership 4308: quorum 
>> acquired
>> Jul  5 10:41:57 node2 stonith-ng[604]:   notice: plugin_handle_membership: 
>> Node node11[794540] - state is now member (was (null))
>> Jul  5 10:41:57 node2 stonith-ng[604]:   notice: On loss of CCM Quorum: 
>> Ignore
>> Jul  5 10:41:58 node2 stonith-ng[604]:   notice: Added 'st-fence_propio:0' 
>> to the device list (1 active devices)
>> Jul  5 10:41:59 node2 stonith-ng[604]:   notice: Operation reboot of node2 
>> by node11 for crmd.2141@node11.61c3e613: OK
>> Jul  5 10:41:59 node2 crmd[608]: crit: We were allegedly just fenced by 
>> node11 for node11!
>> Jul  5 10:41:59 node2 corosync[585]:   [pcmk  ] info: pcmk_ipc_exit: Client 
>> crmd (conn=0x228d970, async-conn=0x228d970) left
>> Jul  5 10:41:59 node2 pacemakerd[597]:  warning: The crmd process (608) can 
>> no longer be respawned, shutting the cluster 

Re: [ClusterLabs] Introducing the Anvil! Intelligent Availability platform

2017-07-05 Thread Ken Gaillot
Wow! I'm looking forward to the September summit talk.

On 07/05/2017 01:52 AM, Digimer wrote:
> Hi all,
> 
>   I suspect by now, many of you here have heard me talk about the Anvil!
> intelligent availability platform. Today, I am proud to announce that it
> is ready for general use!
> 
> https://github.com/ClusterLabs/striker/releases/tag/v2.0.0
> 
>   I started five years ago with an idea of building an "Availability
> Appliance". A single machine where any part could be failed, removed and
> replaced without needing a maintenance window. A system with no single
> point of failure anywhere wrapped behind a very simple interface.
> 
>   The underlying architecture that provides this redundancy was laid
> down years ago as an early tutorial and has been field tested all over
> North America and around the world in the years since. In that time, the
> Anvil! platform has demonstrated over 99.% availability!
> 
>   Starting back then, the goal was to write the web interface that made
> it easy to use the Anvil! platform. Then, about two years ago, I decided
> that an Anvil! could be much, much more than just an appliance.
> 
>   It could think for itself.
> 
>   Today, I would like to announce version 2.0.0. This releases
> introduces the ScanCore "decision engine". ScanCore can be thought of as
> a sort of "Layer 3" availability platform. Where Corosync provides
> membership and communications, with Pacemaker (and rgmanager) sitting on
> top monitoring applications and handling fault detection and recovery,
> ScanCore sits on top of both, gathering disparate data, analyzing it and
> making "big picture" decisions on how to best protect the hosted servers.
> 
>   Examples;
> 
> 1. All servers are on node 1, and node 1 suffers a cooling fan failure.
> ScanCore compares against node 2's health, waits a period of time in
> case it is a transient fault and the autonomously live-migrates the
> servers to node 2. Later, node 2 suffers a drive failure, degrading the
> underlying RAID array. ScanCore can then compare the relative risks of a
> failed fan versus a degraded RAID array, determine that the failed fan
> is less risky and automatically migrate the servers back to node 1. If a
> hot-spare kicks in and the array returns to an Optimal state, ScanCore
> will again migrate the servers back to node 2. When node 1's fan failure
> is finally repaired, the servers stay on node 2 as there is no benefit
> to migrating as now both nodes are equally healthy.
> 
> 2. Input power is lost to one UPS, but not the second UPS. ScanCore
> knows that good power is available and, so, doesn't react in any way. If
> input power is lost to both UPSes, however, then ScanCore will decide
> that the greatest risk the server availability is no longer unexpected
> component failure, but instead depleting the batteries. Given this, it
> will decide that the best option to protect the hosted servers is to
> shed load and maximize run time. if the power stays out for too long,
> then ScanCore will determine hard off is imminent, and decide to
> gracefully shut down all hosted servers, withdraw and power off. Later,
> when power returns, the Striker dashboards will monitor the charge rate
> of the UPSes and as soon as it is safe to do so, restart the nodes and
> restore full redundancy.
> 
> 3. Similar to case 2, ScanCore can gather temperature data from multiple
> sources and use this data to distinguish localized cooling failures from
> environmental cooling failures, like the loss of an HVAC or AC system.
> If the former case, ScanCore will migrate servers off and, if critical
> temperatures are reached, shut down systems before hardware damage can
> occur. In the later case, ScanCore will decide that minimizing thermal
> output is the best way to protect hosted servers and, so, will shed load
> to accomplish this. If necessary to avoid damage, ScanCore will perform
> a full shut down. Once ScanCore (on the low-powered Striker dashboards)
> determines thermal levels are safe again, it will restart the nodes and
> restore full redundancy.
> 
>   All of this intelligence is of little use, of course, if it is hard to
> build and maintain an Anvil! system. Perhaps the greatest lesson learned
> from our old tutorial was that the barrier to entry had to be reduced
> dramatically.
> 
> https://www.alteeve.com/w/Build_an_m2_Anvil!
> 
>   So, this release also dramatically simplifies how easy it is to go
> from bare iron to provisioned, protected servers. Even with no
> experience in availability at all, a tech should be able to go from iron
> in boxes to provision servers in one or two days. Almost all steps have
> been automated, which serves the core goal of maximum reliability by
> minimizing the chances for human error.
> 
>   This version also introduces the ability to run entirely offline. This
> version of the Anvil! is entirely self-contained with internal
> repositories making it possible to fully manage an Anvil! with no
> 

[ClusterLabs] IPaddr2 cloning inside containers

2017-04-26 Thread Ken Gaillot
FYI, I stumbled across a report of a suspected kernel issue breaking
iptables clusterip inside containers:

https://github.com/lxc/lxd/issues/2773

ocf:heartbeat:IPaddr2 uses clusterip when cloned. I'm guessing no one's
tried something like that yet, but this is a note of caution to anyone
thinking about it.

Pacemaker's new bundle feature doesn't support cloning the IPs it
creates, but that might be an interesting future feature if this issue
is resolved.
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Problem with clone ClusterIP

2017-04-26 Thread Ken Gaillot
On 04/26/2017 02:45 AM, Bratislav Petkovic wrote:
> Tahank you,
> 
>  
> 
> We use the Cisco Nexus 7000 switches, they support Multicast MAC.
> 
> It is possible that something is not configured correctly.
> 
> In this environment working IBM PowerHA SystemMirror 7.1 (use Multicast)
>  without problems.
>  
> 
> Regards,
> 
>  
> 
> Bratislav

I believe SystemMirror uses multicast IP, which is at a higher level
than multicast Ethernet. Multicast Ethernet is much less commonly seen,
so it's often disabled.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] in standby but still running resources..

2017-04-27 Thread Ken Gaillot
On 04/27/2017 08:29 AM, lejeczek wrote:
> .. is this ok?
> 
> hi guys,
> 
> pcs shows no errors after I did standby node, but pcs shows resources
> still are being ran on the node I just stoodby.
> Is this normal?
> 
> 0.9.152 @C7.3
> thanks
> P.

That should happen only for as long as it takes to stop the resources
there. If it's an ongoing condition, something is wrong.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Coming in Pacemaker 1.1.17: start a node in standby

2017-04-24 Thread Ken Gaillot
Hi all,

Pacemaker 1.1.17 will have a feature that people have occasionally asked
for in the past: the ability to start a node in standby mode.

It will be controlled by an environment variable (set in
/etc/sysconfig/pacemaker, /etc/default/pacemaker, or wherever your
distro puts them):


# By default, nodes will join the cluster in an online state when they first
# start, unless they were previously put into standby mode. If this
variable is
# set to "standby" or "online", it will force this node to join in the
# specified state when starting.
# (experimental; currently ignored for Pacemaker Remote nodes)
# PCMK_node_start_state=default


As described, it will be considered experimental in this release, mainly
because it doesn't work with Pacemaker Remote nodes yet. However, I
don't expect any problems using it with cluster nodes.

Example use cases:

You want want fenced nodes to automatically start the cluster after a
reboot, so they contribute to quorum, but not run any resources, so the
problem can be investigated. You would leave
PCMK_node_start_state=standby permanently.

You want to ensure a newly added node joins the cluster without problems
before allowing it to run resources. You would set this to "standby"
when deploying the node, and remove the setting once you're satisfied
with the node, so it can run resources at future reboots.

You want a standby setting to last only until the next boot. You would
set this permanently to "online", and any manual setting of standby mode
would be overwritten at the next boot.

Many thanks to developers Alexandra Zhuravleva and Sergey Mishin, who
contributed this feature as part of a project with EMC.
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group

2017-04-24 Thread Ken Gaillot
On 04/24/2017 02:33 PM, Lentes, Bernd wrote:
> 
> - On Apr 24, 2017, at 9:11 PM, Ken Gaillot kgail...@redhat.com wrote:
> 
>>>> primitive prim_vnc_ip_mausdb IPaddr \
>>>>params ip=146.107.235.161 nic=br0 cidr_netmask=24 \
>>>>meta is-managed=true
>>
>> I don't see allow-migrate on the IP. Is this a modified IPaddr? The
>> stock resource agent doesn't support migrate_from/migrate_to.
> 
> Not modified. I can migrate the resource without the group easily between the 
> nodes. And also if i try to live-migrate the whole group,
> the IP is migrated.

Unfortunately, migration is not live migration ... a resource (the VM)
can't be live-migrated if it depends on another resource (the IP) that
isn't live-migrateable.

If you modify IPaddr to be live-migrateable, it should work. It has to
support migrate_from and migrate_to actions, and advertise them in the
meta-data. It doesn't necessarily have to do anything different from
stop/start, as long as that meets your needs.

>>> What i found in the net:
>>> http://lists.clusterlabs.org/pipermail/pacemaker/2011-November/012088.html
>>>
>>> " Yes, migration only works without order-contraints the migrating service
>>> depends on ... and no way to force it."
>>
>> I believe this was true in pacemaker 1.1.11 and earlier.
>>
> 
> Then it should be possible:
> 
> ha-idg-2:~ # rpm -q pacemaker
> pacemaker-1.1.12-11.12
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Question about fence_mpath

2017-04-28 Thread Ken Gaillot
On 04/28/2017 03:37 PM, Chris Adams wrote:
> Once upon a time, Seth Reid  said:
>> This confused me too when I set up my cluster. I found that everything
>> worked better if I didn't specify a device path. I think there was
>> documentation on Redhat that led me to try removing the "device" options.
> 
> fence_mpath won't work without device(s).  However, I figured out my
> problem: I needed to set pmck_host_check=none (both nodes in my cluster
> can handle fencing).  Then everything seems to work.

You only want pcmk_host_check=none if the fence device can fence either
node. If the device can only fence one node, you want
pcmk_host_list=


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group

2017-04-25 Thread Ken Gaillot
On 04/25/2017 09:14 AM, Lentes, Bernd wrote:
> 
> 
> - On Apr 24, 2017, at 11:11 PM, Ken Gaillot kgail...@redhat.com wrote:
> 
>> On 04/24/2017 02:33 PM, Lentes, Bernd wrote:
>>>
>>> ----- On Apr 24, 2017, at 9:11 PM, Ken Gaillot kgail...@redhat.com wrote:
>>>
>>>>>> primitive prim_vnc_ip_mausdb IPaddr \
>>>>>>params ip=146.107.235.161 nic=br0 cidr_netmask=24 \
>>>>>>meta is-managed=true
>>>>
>>>> I don't see allow-migrate on the IP. Is this a modified IPaddr? The
>>>> stock resource agent doesn't support migrate_from/migrate_to.
>>>
>>> Not modified. I can migrate the resource without the group easily between 
>>> the
>>> nodes. And also if i try to live-migrate the whole group,
>>> the IP is migrated.
>>
>> Unfortunately, migration is not live migration ... a resource (the VM)
>> can't be live-migrated if it depends on another resource (the IP) that
>> isn't live-migrateable.
>>
>> If you modify IPaddr to be live-migrateable, it should work. It has to
>> support migrate_from and migrate_to actions, and advertise them in the
>> meta-data. It doesn't necessarily have to do anything different from
>> stop/start, as long as that meets your needs.
>>
> 
> Hi Ken,
> 
> that means i have to edit the resource agent ?

Yes, copy it to a new name, and edit that. Best practice is to create
your own subdirectory under /usr/lib/ocf/resource.d and put it there, so
you use it as ocf::

> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource group vs colocation

2017-04-27 Thread Ken Gaillot
On 04/27/2017 02:02 PM, lejeczek wrote:
> hi everyone
> 
> I have a group and I'm trying to colocate - sounds strange - order with
> the group is not how I want it.
> I was hoping that with colocation set I can reorder the resources - can
> I? Because .. something, or my is not getting there.
> I have within a group:
> 
> IP
> mount
> smb
> IP1
> 
> and I colocated sets:
> 
> set IP IP1 sequential=false set mount smb
> 
> and yet smb would not start on IP1. I see resource are still being order
> as they list.
> 
> Could somebody shed more light on what is wrong and group vs colocation
> subject?
> 
> m. thanks
> L.

A group is a shorthand for colocation and order constraints between its
members. So, you should use either a group, or a colocation set, but not
both with the same members.

If you simply want to reorder the sequence in which the group members
start, just recreate the group, listing them in the order you want. That
is, the first member of the group will be started first, then the second
member, etc.

If you prefer using sets, then don't group the resources -- use separate
colocation and ordering constraints with the sets, as desired.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] should such a resource set work?

2017-04-28 Thread Ken Gaillot
On 04/28/2017 08:17 AM, lejeczek wrote:
> hi everybody
> 
> I have a set:
> 
> set IP2 IP2 IP2 LVM(exclusive) mountpoint smb smartd sequential=true
  ^^^

Is this a typo?

> setoptions score=INFINITY
> 
> it should work, right?
> 
> yet when I standby a node and I see cluster jumps straight to mountpoint
> and fails:
> 
> Failed Actions:
> * aLocalStorage5mnt_start_0 on nodeA 'not installed' (5): call=918,
> status=complete, exitreason='Couldn't find device
> [/dev/mapper/0-raid10.A]. Expected /dev/??? to exist',
> 
> Where am I making a mistake?
> thanks
> L.

Is this in a location, colocation, or order constraint?

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Problem with clone ClusterIP

2017-04-25 Thread Ken Gaillot
On 04/25/2017 09:32 AM, Bratislav Petkovic wrote:
> I want to make active/active cluster with two physical servers.
> 
> On the servers are installed: oraclelinux-release-7.2-1.0.5.el7.x86_64,
> 
> Pacemaker 1.1.13-10.el7, Corosync Cluster Engine, version '2.3.4',
> 
> pcs 0.9.143. Cluster starts without a problem and I create a resource
> 
> ClusterIP that is in the same subnet as the IP addresses of the servers.
> 
> After creating I access ClusterIP without problems, but I clone ClusterIP
> 
> I can no longer access the this IP.
> 
> I did everything according to instructions from clusterlab.
> 
> Each server has two network cards that are on the teaming with LACP.
> 
>  
> 
> Best regards,
> 
>  
> 
> Bratislav Petkovic

IPaddr2 cloning depends on a special feature called multicast MAC. On
the host side, this is done via iptables' clusterip capability. However
not all Ethernet switches support multicast MAC (or the administrator
disabled it), so that is a possible cause.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-04 Thread Ken Gaillot
luster daemon that
called it, so the crmd logged the error as well, that the result of the
operation was "not configured".

Then (above), when the policy engine reads the current status of the
cluster, it sees that there is a failed operation, so it decides what to
do about the failure.

> The doc says:
> "Some operations are generated by the cluster itself, for example, stopping 
> and starting resources as needed."
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_resource_operations.html
>  . Is the doc wrong ?
> What happens when i DON'T configure start/stop operations ? Are they 
> configured automatically ?
> I have several primitives without a configured start/stop operation, but 
> never had any problems with them.

Start and stop are indeed created by the cluster itself. If there are
start and stop operations configured in the cluster configuration, those
are used solely to get the meta-attributes such as timeout, to override
the defaults.

> failcount is direct INFINITY:
> Aug  1 14:19:33 ha-idg-1 attrd[4690]:   notice: attrd_trigger_update: Sending 
> flush op to all hosts for: fail-count-prim_drbd_idcc_devel (INFINITY)
> Aug  1 14:19:33 ha-idg-1 attrd[4690]:   notice: attrd_perform_update: Sent 
> update 8: fail-count-prim_drbd_idcc_devel=INFINITY

Yes, a few result codes are considered "fatal", or automatically
INFINITY failures. The idea is that if the resource is misconfigured,
that's not going to change by simply re-running the agent.

> After exact 9 minutes the complaints about the not configured stop operation 
> stopped, the complaints about missing clone-max still appears, although both 
> nodes are in standby

I'm not sure why your nodes are in standby, but that should be unrelated
to all of this, unless perhaps you configured on-fail=standby.

> now fail-count is 1 million:
> Aug  1 14:28:33 ha-idg-1 attrd[4690]:   notice: attrd_trigger_update: Sending 
> flush op to all hosts for: fail-count-prim_drbd_idcc_devel (100)
> Aug  1 14:28:33 ha-idg-1 attrd[4690]:   notice: attrd_perform_update: Sent 
> update 7076: fail-count-prim_drbd_idcc_devel=100

Within Pacemaker, INFINITY = 100. I'm not sure why it's logged
differently here, but it's the same value.

> and a complain about monitor operation appeared again:
> Aug  1 14:28:33 ha-idg-1 crmd[4692]:   notice: process_lrm_event: Operation 
> prim_drbd_idcc_devel_monitor_6: not configured (node=ha-idg-1, call=6968, 
> rc=6, cib-update=6932, confirmed=false)
> Aug  1 14:28:33 ha-idg-1 attrd[4690]:   notice: attrd_cs_dispatch: Update 
> relayed from ha-idg-2
> 
> crm_mon said:
> Failed actions:
> prim_drbd_idcc_devel_stop_0 on ha-idg-1 'not configured' (6): call=6967, 
> status=complete, exit-reason='none', last-rc-change='Tue Aug  1 14:28:33 
> 2017', queued=0ms, exec=41ms
> prim_drbd_idcc_devel_monitor_6 on ha-idg-1 'not configured' (6): 
> call=6968, status=complete, exit-reason='none', last-rc-change='Tue Aug  1 
> 14:28:33 2017', queued=0ms, exec=41ms
> prim_drbd_idcc_devel_stop_0 on ha-idg-2 'not configured' (6): call=6963, 
> status=complete, exit-reason='none', last-rc-change='Tue Aug  1 14:28:33 
> 2017', queued=0ms, exec=40ms
> 
> A big problem was that i have a ClusterMon resource running on each node. It 
> triggered about 2 snmp traps in 193 seconds to my management station, 
> which triggered 2 e-Mails ...
> From where comes this incredible amount of traps ? Nearly all traps said that 
> stop is not configured for the drdb resource. Why complaining so often ? And 
> why stopping after ~20.000 traps ?
> And complaining about not configured monitor operation just 8 times.

I'm not really sure; I haven't used ClusterMon enough to say. If you
have Pacemaker 1.1.15 or later, the alerts feature is preferred to
ClusterMon.

> Btw: is there a history like in the bash where i see which crm command i 
> entered at which time ? I know that crm history is mighty, but didn't find 
> that.
> 
> 
> 
> 
> Bernd

-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fwd: Multi cluster

2017-08-04 Thread Ken Gaillot
On Fri, 2017-08-04 at 18:35 +0200, Jan PokornĂ˝ wrote:
> On 03/08/17 20:37 +0530, sharafraz khan wrote:
> > I am new to clustering so please ignore if my Question sounds silly, i have
> > a requirement were in i need to create cluster for ERP application with
> > apache, VIP component,below is the scenario
> > 
> > We have 5 Sites,
> > 1. DC
> > 2. Site A
> > 3. Site B
> > 4. Site C
> > 5. Site D
> > 
> > Over here we need to configure HA as such that DC would be the primary Node
> > hosting application & be accessed from by all the users in each sites, in
> > case of Failure of DC Node, Site users should automatically be switched to
> > there local ERP server, and not to the Nodes at other sites, so
> > communication would be as below
> > 
> > DC < -- > Site A
> > DC < -- > Site B
> > DC < -- > Site C
> > DC < -- > Site D
> > 
> > Now the challenge is
> > 
> > 1. If i create a cluster between say DC < -- > Site A it won't allow me to
> > create another cluster on DC with other sites

Right, your choices (when using corosync+pacemaker) are one big cluster
with all sites (including the data center), or an independent cluster at
each site connected by booth.

It sounds like your secondary sites don't have any communication between
each other, only to the DC, so that suggests that the "one big cluster"
approach won't work.

For more details on pacemaker+booth, see:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm139900093104976


> > 2. if i setup all the nodes in single cluster how can i ensure that in case
> > of Node Failure or loss of connectivity to DC node from any site, users
> > from that sites should be switched to Local ERP node and not to nodes on
> > other site.

The details depend on the particular service. Unfortunately I don't have
any experience with ERP, maybe someone else can jump in with tips.

How do users contact the ERP node? Via an IP address, or a list of IP
addresses that will be tried in order, or some other way?

Is the ERP service itself managed by the cluster? If so, what resource
agent are you using? Does the agent support cloning or master/slave
operation?

> > 
> > a urgent response and help would be quite helpful
> 
> From your description, I suppose you are limited to just a single
> machine per site/DC (making the overall picture prone to double
> fault, first DC goes down, then any of the sites goes down, then
> at least the clients of that very site encounter the downtime).
> Otherwise I'd suggest looking at booth project that facilitates
> inter-cluster (back to your "multi cluster") decisions, extending
> upon pacemaker performing the intra-cluster ones.
> 
> Using a single cluster approach, you should certainly be able to
> model your fallback scenario, something like:
> 
> - define a group A (VIP, apache, app), infinity-located with DC
> - define a different group B with the same content, set up as clone
>   B_clone being (-infinity)-located with DC
> - set up ordering "B_clone starts when A stops", of "Mandatory" kind
> 
> Further tweaks may be needed.

-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Notification agent and Notification recipients

2017-08-04 Thread Ken Gaillot
On Thu, 2017-08-03 at 12:31 +0530, Sriram wrote:
> 
> Hi Team,
> 
> 
> We have a four node cluster (1 active : 3 standby) in our lab for a
> particular service. If the active node goes down, one of the three
> standby node  becomes active. Now there will be (1 active :  2
> standby : 1 offline).
> 
> 
> Is there any way where this newly elected node sends notification to
> the remaining 2 standby nodes about its new status ?

Hi Sriram,

This depends on how your service is configured in the cluster.

If you have a clone or master/slave resource, then clone notifications
is probably what you want (not alerts, which is the path you were going
down -- alerts are designed to e.g. email a system administrator after
an important event).

For details about clone notifications, see:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_clone_resource_agent_requirements

The RA must support the "notify" action, which will be called when a
clone instance is started or stopped. See the similar section later for
master/slave resources for additional information. See the mysql or
pgsql resource agents for examples of notify implementations.

> I was exploring "notification agent" and "notification recipient"
> features, but that doesn't seem to work. /etc/sysconfig/notify.sh
> doesn't get invoked even in the newly elected active node. 

Yep, that's something different altogether -- it's only enabled on RHEL
systems, and solely for backward compatibility with an early
implementation of the alerts interface. The new alerts interface is more
flexible, but it's not designed to send information between cluster
nodes -- it's designed to send information to something external to the
cluster, such as a human, or an SNMP server, or a monitoring system.


> Cluster Properties:
>  cluster-infrastructure: corosync
>  dc-version: 1.1.17-e2e6cdce80
>  default-action-timeout: 240
>  have-watchdog: false
>  no-quorum-policy: ignore
>  notification-agent: /etc/sysconfig/notify.sh
>  notification-recipient: /var/log/notify.log
>  placement-strategy: balanced
>  stonith-enabled: false
>  symmetric-cluster: false
> 
> 
> 
> 
> I m using the following versions of pacemaker and corosync.
> 
> 
> /usr/sbin # ./pacemakerd --version
> Pacemaker 1.1.17
> Written by Andrew Beekhof
> /usr/sbin # ./corosync -v
> Corosync Cluster Engine, version '2.3.5'
> Copyright (c) 2006-2009 Red Hat, Inc.
> 
> 
> Can you please suggest if I m doing anything wrong or if there any
> other mechanisms to achieve this ?
> 
> 
> Regards,
> Sriram.
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Ken Gaillot <kgail...@redhat.com>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-07 Thread Ken Gaillot
On Mon, 2017-08-07 at 12:54 +0200, Lentes, Bernd wrote:
> - On Aug 4, 2017, at 10:19 PM, Ken Gaillot kgail...@redhat.com wrote:
> 
> > Unfortunately no -- logging, and troubleshooting in general, is an area
> > we are continually striving to improve, but there are more to-do's than
> > time to do them.
> 
> sad but comprehensible. Is it worth trying to understand the logs or should i 
> keep an eye on
> hb-report or crm history ? I played a bit around with hb_report but it seems 
> it just collects information already available and does not simplify the view 
> on it.

The logs are very useful, but not particularly easy to follow. It takes
some practice and experience, but I think it's worth it if you have to
troubleshoot cluster events often.

It's on the to-do list to create a "Troubleshooting Pacemaker" document
that helps with this and using tools such as crm_simulate.

The first step in understanding the logs is to learn what the pacemaker
daemons are and what they do, and what the DC node is. It starts to make
more sense from there:

   pacemakerd: spawns all other daemons and re-spawns them if they crash
   attrd: manages node attributes
   cib: manages reading/writing the configuration
   lrmd: executes resource agents
   pengine: given a cluster state, determines any actions needed
   crmd: manages cluster membership and carries out the pengine's
decisions by asking the lrmd to perform actions

At any given time, one node's crmd in the cluster (or partition if there
is a network split) is elected as the DC (designated controller). The DC
asks the pengine what needs to be done, then farms out the results to
all the other crmd's, which (if necessary) call their local lrmd to
actually execute the actions.

> > The "ERROR" message is coming from the DRBD resource agent itself, not
> > pacemaker. Between that message and the two separate monitor operations,
> > it looks like the agent will only run as a master/slave clone.
> 
> Yes. I see it in the RA.
> 
> >> And why does it complain that stop is not configured ?
> > 
> > A confusing error message. It's not complaining that the operations are
> > not configured, it's saying the operations failed because the resource
> > is not properly configured. What "properly configured" means is up to
> > the individual resource agent.
> 
> Aah. And why does it not complain a "failed" start op ?
> Because i have "target-role=stopped" in rsc_defaults ? So it tries not to 
> start but stop the resource initially ?

target-role=Stopped will indeed prevent it from trying to start, which
explains why there's no message for that. It shouldn't try a stop though
unless one is needed, so I'm not sure offhand why the stop was
initiated.

> 
> >> The DC says:
> >> Aug  1 14:19:33 ha-idg-2 pengine[27043]:  warning: unpack_rsc_op_failure:
> >> Processing failed op stop for prim_drbd_idcc_devel on ha-idg-1: not 
> >> configured
> >> (6)
> >> Aug  1 14:19:33 ha-idg-2 pengine[27043]:error: unpack_rsc_op: 
> >> Preventing
> >> prim_drbd_idcc_devel from re-starting anywhere: operation stop failed 'not
> >> configured' (6)
> >> 
> >> Again complaining about a failed stop, saying it's not configured. Or does 
> >> it
> >> complain that the fail of a stop op is not configured ?
> > 
> > Again, it's confusing, but you have various logs of the same event
> > coming from three different places.
> > 
> > First, DRBD logged that there is a "meta parameter misconfigured". It
> > then reported that error value back to the crmd cluster daemon that
> > called it, so the crmd logged the error as well, that the result of the
> > operation was "not configured".
> > 
> > Then (above), when the policy engine reads the current status of the
> > cluster, it sees that there is a failed operation, so it decides what to
> > do about the failure.
> 
> Ok.
>  
> >> The doc says:
> >> "Some operations are generated by the cluster itself, for example, 
> >> stopping and
> >> starting resources as needed."
> >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_resource_operations.html
> >> . Is the doc wrong ?
> >> What happens when i DON'T configure start/stop operations ? Are they 
> >> configured
> >> automatically ?
> >> I have several primitives without a configured start/stop operation, but 
> >> never
> >> had any problems with them.
> > 
> > Start and stop are indeed created by the cluster itself. If there are
> > start and stop operations configured 

<    1   2   3   4   5   6   7   8   9   10   >