Re: [ClusterLabs] Recovering after split-brain

2016-06-20 Thread Ken Gaillot
On 06/20/2016 08:30 AM, Nikhil Utane wrote: > Hi, > > For our solution we are making a conscious choice to not use > quorum/fencing as for us service availability is more important than > having 2 nodes take up the same active role. Split-brain is not an issue > for us (at least i think that way)

Re: [ClusterLabs] Cluster reboot fro maintenance

2016-06-20 Thread Ken Gaillot
On 06/20/2016 07:45 AM, ma...@nucleus.it wrote: > Hi, > i have a two node cluster with some vms (pacemaker resources) running on > the two hypervisors: > pacemaker-1.0.10 > corosync-1.3.0 > > I need to do maintenance stuff , so i need to: > - put on maintenance the cluster so the cluster doesn't

Re: [ClusterLabs] crm_resource --cleanup and cluster-recheck-interval

2016-06-23 Thread Ken Gaillot
On 06/15/2016 05:44 AM, Vladislav Bogdanov wrote: > Hi, > > It seems that after recent commit which introduces staggered probes > running 'crm_resource --cleanup' (without --resource) leads to cluster > to finish recheck too long after cleanup was done. What I see: cluster > fires probes for the

Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-01-08 Thread Ken Gaillot
ne of the lower-level tools (crm_attribute or attrd_updater) so you don't have a dependency on a higher-level tool that may not always be installed. > Thank You. > > On Tue, Dec 22, 2015 at 9:44 PM, Nikhil Utane <nikhil.subscri...@gmail.com> > wrote: > >> Thanks to you Ke

Re: [ClusterLabs] Parallel adding of resources

2016-01-08 Thread Ken Gaillot
On 01/08/2016 12:34 AM, Arjun Pandey wrote: > Hi > > I am running a 2 node cluster with this config on centos 6.6 > > Master/Slave Set: foo-master [foo] > Masters: [ messi ] > Stopped: [ronaldo ] > eth1-CP(ocf::pw:IPaddr): Started messi > eth2-UP(ocf::pw:IPaddr):

Re: [ClusterLabs] Automatic Recover for stonith:external/libvirt

2016-01-08 Thread Ken Gaillot
On 01/08/2016 08:56 AM, m...@inwx.de wrote: > Hello List, > > I have here a test environment for checking pacemaker. Sometimes our > kvm-hosts with libvirt have trouble with responding the stonith/libvirt > resource, so I like to configure the service to realize as failed after > three failed

Re: [ClusterLabs] Antw: Re: Antw: Re: Early VM resource migration

2016-01-08 Thread Ken Gaillot
Regards, >> > KIecho >> > >> > On 17.12.2015 08:19:43 Ulrich Windl wrote: >> >> >>> Klechomir <kle...@gmail.com> schrieb am 16.12.2015 um 17:30 in >> >> >>> Nachricht >> >> >> >> <5671918e.40...@gmail

Re: [ClusterLabs] All IP resources deleted once a fenced node rejoins

2016-01-15 Thread Ken Gaillot
On 01/15/2016 11:08 AM, Ken Gaillot wrote: >> Jan 13 19:33:00 [4291] oranacib: info: >> cib_process_replace: Replacement 0.4.0 from kamet not applied to >> 0.74.1: current epoch is greater than the replacement >> Jan 13 19:33:00 [4291] or

Re: [ClusterLabs] All IP resources deleted once a fenced node rejoins

2016-01-15 Thread Ken Gaillot
On 01/15/2016 05:02 AM, Arjun Pandey wrote: > Based on corosync logs from orana ( The node that did the actual > fencing and is the current master node) > > I also tried looking at pengine outputs based on crm_simulate. Uptil > the fenced node rejoins things look good. > > [root@ucc1 orana]#

[ClusterLabs] Pacemaker 1.1.14 released

2016-01-14 Thread Ken Gaillot
and welcome. -- Ken Gaillot <kgail...@redhat.com> ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scrat

Re: [ClusterLabs] Pacemaker and Corosync versions compatibility

2016-06-27 Thread Ken Gaillot
On 06/24/2016 08:46 PM, Maciej Kopczyński wrote: > Hello, > > I've been following a tutorial to set up a simple HA cluster using > Pacemaker and Corosync on CentOS 6.x while I have noticed that in the > original documentation it is stated that: > > "Since |pcs| has the ability to manage all

Re: [ClusterLabs] Pacemaker (remote) component relations

2016-02-08 Thread Ken Gaillot
On 02/08/2016 07:55 AM, Ferenc Wágner wrote: > Hi, > > I'm looking for information about the component interdependencies, > because I'd like to split the Pacemaker packages in Debian properly. > The current idea is to create two daemon packages, pacemaker and > pacemaker-remote, which exclude

Re: [ClusterLabs] Working with 2 VIPs

2016-02-09 Thread Ken Gaillot
On 02/08/2016 04:24 AM, Louis Chanouha wrote: > Hello, > I'm not sure if this mailign is the proper place to send ma request, please > tell > me where i should send it if not :) This is the right place :) > I have an use case that i can't run acutally with corosync + pacemaker. > > I have two

Re: [ClusterLabs] "After = syslog.service" it is not working?

2016-02-05 Thread Ken Gaillot
especially to_syslog and syslog_facility. If that isn't an issue, then I would check /etc/rsyslog.conf and /etc/rsyslog.d/* to see if they do anything nonstandard. >> -Original Message- >> From: Ken Gaillot [mailto:kgail...@redhat.com] >> Sent: Friday, January 29

Re: [ClusterLabs] Fwd: dlm not starting

2016-02-05 Thread Ken Gaillot
> am configuring shared storage for 2 nodes (Cent 7) installed > pcs/gfs2-utils/lvm2-cluster when creating resource unable to start dlm > > crm_verify -LV >error: unpack_rsc_op:Preventing dlm from re-starting anywhere: > operation start failed 'not configured' (6) Are you using the

Re: [ClusterLabs] "After = syslog.service" it is not working?

2016-02-11 Thread Ken Gaillot
nfigured not to log via syslog, >> or rsyslog is configured not to send pacemaker logs to /var/log/messages. >> >> Check the value of PCMK_logfacility and PCMK_logpriority in your >> /etc/sysconfig/pacemaker. By default, pacemaker will log via syslog, but if >> these

Re: [ClusterLabs] Cluster resources migration from CMAN to Pacemaker

2016-01-27 Thread Ken Gaillot
On 01/27/2016 02:34 AM, jaspal singla wrote: > Hi Jan, > > Thanks for your replies!! > > I have couple of concerns more to answer, please help! I'm not familiar with rgmanager, so there may be better ways that hopefully someone else can suggest, but here are some ideas off the top of my head:

Re: [ClusterLabs] cron-suitable cluster status check

2016-02-29 Thread Ken Gaillot
On 02/27/2016 03:56 PM, Devin Reade wrote: > Right now in a test cluster on CentOS 7 I'm occasionally seeing > resource monitoring failures and, just today, a failure to start > a fencing agent. While I need to track those down problems, the > issue I want to discuss here is being notified when

Re: [ClusterLabs] Pacemaker shows false status of a resource and doesn't react on OCF_NOT_RUNNING rc.

2016-01-19 Thread Ken Gaillot
14, feel free to open a bug report. It might be worth revisiting to see if there is anything we can do about it. > Thank you, > Kostia > > On Tue, Jan 19, 2016 at 5:17 PM, Bogdan Dobrelya <bdobre...@mirantis.com> > wrote: > >> On 19.01.2016 16:13, Ken Gaillot wrote:

Re: [ClusterLabs] Pacemaker shows false status of a resource and doesn't react on OCF_NOT_RUNNING rc.

2016-01-19 Thread Ken Gaillot
t; Jan 19 23:10:39 A2-2U12-302-LS ntpd[2210]: 0.0.0.0 c61c 0c clock_step >> -43193.793349 s >> Jan 19 11:10:45 A2-2U12-302-LS ntpd[2210]: 0.0.0.0 c614 04 freq_mode >> Jan 19 11:10:45 A2-2U12-302-LS systemd[1]: Time has been changed >> >> I am attaching corosync.log. &

Re: [ClusterLabs] Pacemaker shows false status of a resource and doesn't react on OCF_NOT_RUNNING rc.

2016-01-19 Thread Ken Gaillot
ilure, it will of course log that it happened, but if it has what it thinks is a newer result (from 23:42), it won't treat the failure as the current state. Again, that's just a guess at this point, but I think that's what's happening. > Thank you, > Kostia > > On Tue, Jan 19, 2016 at 8:0

Re: [ClusterLabs] Wait until resource is really ready before moving clusterip

2016-01-26 Thread Ken Gaillot
On 01/26/2016 05:06 AM, Joakim Hansson wrote: > Thanks for the help guys. > I ended up patching together my own RA from the Delay and Dummy RA's and > using curl to request the header of solr's ping request handler on > localhost, which made the resource start return a bit more dynamic. > However,

Re: [ClusterLabs] ClusterIP location constraint reappears after reboot

2016-02-18 Thread Ken Gaillot
On 02/18/2016 01:07 PM, Jeremy Matthews wrote: > Hi, > > We're having an issue with our cluster where after a reboot of our system a > location constraint reappears for the ClusterIP. This causes a problem, > because we have a daemon that checks the cluster state and waits until the >

[ClusterLabs] Coming in Pacemaker 1.1.15: graceful Pacemaker Remote node stops

2016-02-19 Thread Ken Gaillot
(in this case pacemaker_remoted) before updating. -- Ken Gaillot <kgail...@redhat.com> ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started

Re: [ClusterLabs] Clone Issue

2016-02-14 Thread Ken Gaillot
On 02/13/2016 08:09 PM, Frank D. Engel, Jr. wrote: > Hi, > > I'm new to the software, and with the list - just started experimenting > with trying to get a cluster working using CentOS 7 and the pcs utility, > and I've made some progress, but I can't quite figure out why I'm seeing > this one

Re: [ClusterLabs] pacemaker remote configuration on ubuntu 14.04

2016-03-09 Thread Ken Gaillot
emote node > selinux is disabled and I specifically opened firewall on 2224, 3121 and > 21064 tcp and 5405 udp > >> On 08 Mar 2016, at 08:51, Ken Gaillot <kgail...@redhat.com> wrote: >> >> On 03/07/2016 09:10 PM, Сергей Филатов wrote: >>> Thanks for an answer.

Re: [ClusterLabs] FLoating IP failing over but not failing back with active/active LDAP (dirsrv)

2016-03-10 Thread Ken Gaillot
topping and starting the service. Your goal is already accomplished by using a clone with master-max=2. With the clone, pacemaker will run the service on both nodes, and with master-max=2, it will be master/master. > -Original Message- > From: Ken Gaillot [mailto:kgail...@redhat.

Re: [ClusterLabs] FLoating IP failing over but not failing back with active/active LDAP (dirsrv)

2016-03-10 Thread Ken Gaillot
On 03/10/2016 08:48 AM, Bernie Jones wrote: > A bit more info.. > > > > If, after I restart the failed dirsrv instance, I then perform a "pcs > resource cleanup dirsrv-daemon" to clear the FAIL messages then the failover > will work OK. > > So it's as if the cleanup is changing the status in

Re: [ClusterLabs] Stonith ignores resource stop errors

2016-03-10 Thread Ken Gaillot
On 03/10/2016 04:42 AM, Klechomir wrote: > Hi List > > I'm testing stonith now (pacemaker 1.1.8), and noticed that it properly kills > a node with stopped pacemaker, but ignores resource stop errors. > > I'm pretty sure that the same version worked properly with stonith before. > Maybe I'm

Re: [ClusterLabs] ClusterIP location constraint reappears after reboot

2016-03-14 Thread Ken Gaillot
ANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Feb 21 23:10:43 g5se-dea2b1 stonith-ng[1558]: notice: unpack_config: On > loss of CCM Quorum: Ignore > Feb 21 23:10:43 g5se-dea2b1 cib[1557]

Re: [ClusterLabs] Cluster failover failure with Unresolved dependency

2016-03-18 Thread Ken Gaillot
_stop_0, @operation=stop, > @transition-key=12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > @transition-magic=0:0;12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > @call-id=1337, @last-run=1458124687, @last-rc-change=1458124687, > @exec-time=56 > Mar 16 11:38:07 [7420] HWJ-626.doma

Re: [ClusterLabs] Unable to create HAProxy resource: no such resource agent

2016-03-11 Thread Ken Gaillot
On 03/11/2016 02:18 PM, Matthew Mucker wrote: > I've created a Pacemaker cluster and have created a virtual IP address > resource that works properly. I am now attempting to add HAProxy as a > resource and I'm having problems. > > > - I installed HAProxy on all nodes of the cluster > > - I

Re: [ClusterLabs] Unable to create HAProxy resource: no such resource agent

2016-03-11 Thread Ken Gaillot
On 03/11/2016 03:25 PM, Matthew Mucker wrote: > I found the problem. When I used wget to retrieve the file, I was actually > downloading an HTML error page from my proxy server instead of the intended > file. > > > Oops. :-) I've done that before too ... > >

Re: [ClusterLabs] Problems with pcs/corosync/pacemaker/drbd/vip/nfs

2016-03-15 Thread Ken Gaillot
On 03/14/2016 12:47 PM, Todd Hebert wrote: > Hello, > > I'm working on setting up a test-system that can handle NFS failover. > > The base is CentOS 7. > I'm using ZVOL block devices out of ZFS to back DRBD replicated volumes. > > I have four DRBD resources (r0, r1, r2, r3, which are /dev/drbd1

Re: [ClusterLabs] pacemaker remote configuration on ubuntu 14.04

2016-03-11 Thread Ken Gaillot
te in pacemaker logs >> I don’t have ipv6 address for remote node, but I guess it should be >> listening >> on both >> >> attached pacemaker.log for cluster node >> >> >> >>> On 09 Mar 2016, at 10:23, Ken Gaillot <kgai

Re: [ClusterLabs] PCS, Corosync, Pacemaker, and Bind

2016-03-19 Thread Ken Gaillot
On 03/15/2016 06:47 PM, Mike Bernhardt wrote: > Not sure if this is a BIND question or a PCS/Corosync question, but > hopefully someone has done this before: > > > > I'm setting up a new CentOS 7 DNS server cluster to replace our very old > CentOS 4 cluster. The old one uses heartbeat which is

Re: [ClusterLabs] Cluster failover failure with Unresolved dependency

2016-03-19 Thread Ken Gaillot
all resources are started on the > "survivor" node. > > Best regards, > Lorand > > > On Wed, Mar 16, 2016 at 4:34 PM, Ken Gaillot <kgail...@redhat.com> wrote: > >> On 03/16/2016 05:49 AM, Lorand Kelemen wrote: >>> Dear Ken, >>> >>&

Re: [ClusterLabs] documentation on STONITH with remote nodes?

2016-03-14 Thread Ken Gaillot
On 03/12/2016 05:07 AM, Adam Spiers wrote: > Is there any documentation on how STONITH works on remote nodes? I > couldn't find any on clusterlabs.org, and it's conspicuously missing > from: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/ > > I'm guessing the

Re: [ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected

2016-03-30 Thread Ken Gaillot
are many tweaks that can make a big difference in performance. I'm not sure how familiar you are with them already. Options depend on what your storage is (local or network, hardware/software/no RAID, etc.) and what your I/O-bound application is (database, etc.), but I'd look closely at cache/buffer

Re: [ClusterLabs] Pacemaker on-fail standby recovery does not start DRBD slave resource

2016-03-30 Thread Ken Gaillot
On 03/30/2016 11:20 AM, Sam Gardner wrote: > I have configured some network resources to automatically standby their node > if the system detects a failure on them. However, the DRBD slave that I have > configured does not automatically restart after the node is "unstandby-ed" > after the

Re: [ClusterLabs] Freezing/Unfreezing in Pacemaker ?

2016-04-07 Thread Ken Gaillot
On 04/07/2016 06:40 AM, jaspal singla wrote: > Hello, > > As we have clusvcadm -U and clusvcadm -Z > to freeze and unfreeze resource in CMAN. Would really appreciate if > someone please give some pointers for freezing/unfreezing a resource in > Pacemaker (pcs) as well. > > Thanks, > Jaspal

Re: [ClusterLabs] DRBD on asymmetric-cluster

2016-04-07 Thread Ken Gaillot
o this manually (only for testing) > > I hope someone can help :( > > Thanks in advance > > On Mon, Apr 4, 2016 at 4:50 PM, Jason Voorhees <jvoorhe...@gmail.com> wrote: >> I started reading "Pacemaker explained" but as it's so depth I didn't >> read

Re: [ClusterLabs] DRBD on asymmetric-cluster

2016-04-07 Thread Ken Gaillot
On 04/07/2016 10:30 AM, Jason Voorhees wrote: >> FYI, commands that "move" a resource do so by adding location >> constraints. The ID of these constraints will start with "cli-". They >> override the normal behavior of the cluster, and stay in effect until >> you explicitly remove them. (With pcs,

[ClusterLabs] HA meetup at OpenStack Summit

2016-04-12 Thread Ken Gaillot
(at the venue) or dinner (offsite). It might also be possible to reserve a small (10-person) meeting room, or just meet informally in the expo hall. Anyone interested? Preferences/conflicts? -- Ken Gaillot <kgail...@redhat.com> ___ Users mailing list:

Re: [ClusterLabs] attrd: Fix sigsegv on exit if initialization failed

2016-03-19 Thread Ken Gaillot
On 10/12/2015 06:08 AM, Vladislav Bogdanov wrote: > Hi, > > This was caught with 0.17.1 libqb, which didn't play well with long pids. > > commit 180a943846b6d94c27b9b984b039ac0465df64da > Author: Vladislav Bogdanov > Date: Mon Oct 12 11:05:29 2015 + > > attrd:

Re: [ClusterLabs] reproducible split brain

2016-03-19 Thread Ken Gaillot
On 03/16/2016 03:04 PM, Christopher Harvey wrote: > On Wed, Mar 16, 2016, at 04:00 PM, Digimer wrote: >> On 16/03/16 03:59 PM, Christopher Harvey wrote: >>> I am able to create a split brain situation in corosync 1.1.13 using >>> iptables in a 3 node cluster. >>> >>> I have 3 nodes, vmr-132-3,

Re: [ClusterLabs] Cluster goes to unusable state if fencing resource is down

2016-03-20 Thread Ken Gaillot
On 03/18/2016 02:58 AM, Arjun Pandey wrote: > Hi > > I am running a 2 node cluster with this config on centos 6.6 where i > have a multi-state resource foo being run in master/slave mode and a > bunch of floating IP addresses configured. Additionally i have a > collocation constraint for the IP

Re: [ClusterLabs] no clone for pcs-based cluster fencing?

2016-03-21 Thread Ken Gaillot
On 03/20/2016 06:20 PM, Devin Reade wrote: > I'm looking at a new pcs-style two node cluster running on CentOS 7 > (pacemaker 1.1.13, corosync 2.3.4) and crm_mon shows this line > for my fencing resource, that is the resource running on only one of > the two nodes: > >fence_cl2

Re: [ClusterLabs] Antw: Re: no clone for pcs-based cluster fencing?

2016-03-21 Thread Ken Gaillot
On 03/21/2016 09:34 AM, Ulrich Windl wrote: >>>> Ken Gaillot <kgail...@redhat.com> schrieb am 21.03.2016 um 15:22 in >>>> Nachricht > <56f003b0.4020...@redhat.com>: > > [...] >> It's actually newer pacemaker versions rather than pcs itself

Re: [ClusterLabs] fence_scsi no such device

2016-03-21 Thread Ken Gaillot
On 03/21/2016 08:39 AM, marvin wrote: > > > On 03/15/2016 03:39 PM, Ken Gaillot wrote: >> On 03/15/2016 09:10 AM, marvin wrote: >>> Hi, >>> >>> I'm trying to get fence_scsi working, but i get "no such device" error. >>> It's a two

Re: [ClusterLabs] "No such device" with fence_pve agent

2016-03-22 Thread Ken Gaillot
On 03/22/2016 06:32 AM, Stanislav Kopp wrote: > Hi, > > I have problem with using "fence_pve" agent with pacemaker, the agent > works fine from command line, but if I simulate stonith action or use > "crm node fence ", it doesn't work: > > Mar 22 10:38:06 [675] redis2 stonith-ng: debug: >

Re: [ClusterLabs] "No such device" with fence_pve agent

2016-03-23 Thread Ken Gaillot
On 03/23/2016 06:41 AM, Ferenc Wágner wrote: > Ken Gaillot <kgail...@redhat.com> writes: > >> There is a fence parameter pcmk_host_check that specifies how pacemaker >> determines which fence devices can fence which nodes. The default is >> dynamic-list, which means

Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-03-19 Thread Ken Gaillot
track. >> >> One part of question that is still not answered is on the newly active >> node, how to find out which was the node that went down? >> Anything that gets updated in the status section that can be read and >> figured out? >> >> Thanks. >>

Re: [ClusterLabs] attrd does not clean per-node cache after node removal

2016-03-23 Thread Ken Gaillot
On 03/23/2016 07:35 AM, Vladislav Bogdanov wrote: > Hi! > > It seems like atomic attrd in post-1.1.14 (eb89393) does not > fully clean node cache after node is removed. Is this a regression? Or have you only tried it with this version? > After our QA guys remove node wa-test-server-ha-03 from a

Re: [ClusterLabs] pacemaker remote configuration on ubuntu 14.04

2016-03-08 Thread Ken Gaillot
g, probably configured in /etc/default/pacemaker on ubuntu). You should see "New remote connection" in the remote node's log when the cluster tries to connect, and "LRMD client connection established" if it's successful. As always, check for firewall and SELinux issues. > &g

Re: [ClusterLabs] Connectivity is degraded (Expected=300)

2016-03-02 Thread Ken Gaillot
repositories. > cluster-infrastructure="openais" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > expected-quorum-votes="2" \ > cluster-delay="30s" \ > symmetric-cluster="

Re: [ClusterLabs] ELEMENTARY :: Please Help :: Getting error when building Pacemaker-1.1 from source

2016-03-02 Thread Ken Gaillot
On 03/01/2016 11:24 PM, Sharat Joshi wrote: > Hi List Folk, > > I am very new to Pacemaker and I am trying to build using sources. > After successfully installing libqb and corosync under > /disk1/software/libqb and /disk1/software/corosync and setting > > $ export >

Re: [ClusterLabs] crm_mon change in behaviour PM 1.1.12 -> 1.1.14: crm_mon -XA filters #health.* node attributes

2016-03-03 Thread Ken Gaillot
On 03/03/2016 10:07 AM, Martin Schlegel wrote: > Hello everybody > > > This is my first post on this mailing list and I am only using Pacemaker > since > fall 2015 ... please be gentle :-) and I will do the same. > > > Our cluster is using multiple resource agents that update various node >

Re: [ClusterLabs] Removing node from pacemaker.

2016-03-03 Thread Ken Gaillot
On 03/03/2016 06:04 AM, Debabrata Pani wrote: > http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-n > ode-delete.html > > > > Are we missing the deletion of the nodes from the cib ? That documentation is old; crm_node -R does remove the node from the CIB. > Regards, >

Re: [ClusterLabs] pacemaker remote configuration on ubuntu 14.04

2016-03-07 Thread Ken Gaillot
On 03/06/2016 07:43 PM, Сергей Филатов wrote: > Hi, > I’m trying to set up pacemaker_remote resource on ubuntu 14.04 > I followed "remote node walkthrough” guide > (http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/#idm140473081667280 > >

Re: [ClusterLabs] Issues with crm_mon or ClusterMon resource agent

2016-03-07 Thread Ken Gaillot
On 03/06/2016 08:36 AM, Debabrata Pani wrote: > Hi, > > I would like to understand if anybody has got this working recently. > > Looks like I have missed something in the description and hence the > problem statement is not clear to the group. > > Can I enable some logs in crm_mon to improve

Re: [ClusterLabs] Connectivity is degraded (Expected=300)

2016-03-01 Thread Ken Gaillot
On 03/01/2016 08:24 AM, Rafał Sanocki wrote: > Hello > Can you tell if that message is correct " > > #crm_mon -A > Online: [ nodeA nodeB ] > failover-ip1(ocf::pacemaker:wall): Started nodeB > Clone Set: my-conn > Started: [ nodeA nodeB ] > Clone Set: my-connp > Started: [ nodeA

Re: [ClusterLabs] DRBD on asymmetric-cluster

2016-04-04 Thread Ken Gaillot
On 04/02/2016 01:16 AM, Jason Voorhees wrote: > Hello guys: > > I've been recently reading "Pacemaker - Clusters from scratch" and > working on a CentOS 7 system with pacemaker 1.1.13, corosync-2.3.4 and > drbd84-utils-8.9.5. > > The PDF instructs how to create a DRBD resource that seems to be >

Re: [ClusterLabs] cloned pingd resource problem

2016-03-30 Thread Ken Gaillot
On 03/30/2016 08:38 AM, fatcha...@gmx.de wrote: > Hi, > > I`m running a two node cluster on a fully updated CentOS 7 > (pacemaker-1.1.13-10.el7_2.2.x86_64 pcs-0.9.143-15.el7.x86_64) . I see on one > of our nodes a lot of this in the logfiles: > > Mar 30 12:32:13 localhost crmd[12986]: notice:

Re: [ClusterLabs] service flap as nodes join and leave

2016-04-13 Thread Ken Gaillot
On 04/13/2016 11:23 AM, Christopher Harvey wrote: > I have a 3 node cluster (see the bottom of this email for 'pcs config' > output) with 3 nodes. The MsgBB-Active and AD-Active service both flap > whenever a node joins or leaves the cluster. I trigger the leave and > join with a pacemaker service

Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-24 Thread Ken Gaillot
On 04/22/2016 01:13 PM, Dimitri Maziuk wrote: > On 04/22/2016 12:58 PM, Ken Gaillot wrote: > >>> Consider that monitoring - at least as part of the action - >>> should check if what your service is actually providing is >>> working according to some functional

[ClusterLabs] Pacemaker 1.1.15 - Release Candidate 1

2016-04-22 Thread Ken Gaillot
amauchi, Jan Pokorný, Ken Gaillot, Klaus Wenninger, Kristoffer Grönlund, Lars Ellenberg, Michal Koutný, Nakahira Kazutomo, Ruben Kerkhof, and Yusuke Iida. Apologies if I have overlooked anyone. -- Ken Gaillot <kgail...@redhat.com> ___

Re: [ClusterLabs] Antw: Coming in 1.1.15: Event-driven alerts

2016-04-22 Thread Ken Gaillot
On 04/22/2016 02:43 AM, Klaus Wenninger wrote: > On 04/22/2016 08:16 AM, Ulrich Windl wrote: >>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 21.04.2016 um 19:50 in >>>>> Nachricht >> <571912f3.2060...@redhat.com>: >> >> [...] &

Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-22 Thread Ken Gaillot
On 04/22/2016 08:57 AM, Klaus Wenninger wrote: > On 04/22/2016 03:29 PM, John Gogu wrote: >> Hello community, >> I am facing following situation with a Pacemaker 2 nodes DB cluster >> (3 resources configured into the cluster - 1 MySQL DB resource, 1 >> Apache resource, 1 IP resource ) >> -at

Re: [ClusterLabs] [ClusterLab] : Unable to bring up pacemaker

2016-04-27 Thread Ken Gaillot
On 04/27/2016 11:25 AM, emmanuel segura wrote: > you need to use pcs to do everything, pcs cluster setup and pcs > cluster start, try to use the redhat docs for more information. Agreed -- pcs cluster setup will create a proper corosync.conf for you. Your corosync.conf below uses corosync 1

Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-04-29 Thread Ken Gaillot
On 04/28/2016 10:24 AM, Lars Marowsky-Bree wrote: > On 2016-04-27T12:10:10, Klaus Wenninger wrote: > >>> Having things in ARGV[] is always risky due to them being exposed more >>> easily via ps. Environment variables or stdin appear better. >> What made you assume the

Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-25 Thread Ken Gaillot
On 04/25/2016 10:23 AM, Dmitri Maziuk wrote: > On 2016-04-24 16:20, Ken Gaillot wrote: > >> Correct, you would need to customize the RA. > > Well, you wouldn't because your custom RA will be overwritten by the > next RPM update. Correct again :) I should have mentione

Re: [ClusterLabs] operation parallelism

2016-04-25 Thread Ken Gaillot
On 04/22/2016 09:05 AM, Ferenc Wágner wrote: > Hi, > > Are recurring monitor operations constrained by the batch-limit cluster > option? I ask because I'd like to limit the number of parallel start > and stop operations (because they are resource hungry and potentially > take long) without

Re: [ClusterLabs] attrd does not clean per-node cache after node removal

2016-05-19 Thread Ken Gaillot
On 03/23/2016 12:01 PM, Vladislav Bogdanov wrote: > 23.03.2016 19:52, Vladislav Bogdanov wrote: >> 23.03.2016 19:39, Ken Gaillot wrote: >>> On 03/23/2016 07:35 AM, Vladislav Bogdanov wrote: >>>> Hi! >>>> >>>> It seems like atomic attrd in post

Re: [ClusterLabs] Antw: Re: FR: send failcount to OCF RA start/stop actions

2016-05-23 Thread Ken Gaillot
On 05/20/2016 10:40 AM, Adam Spiers wrote: > Ken Gaillot <kgail...@redhat.com> wrote: >> Just musing a bit ... on-fail + migration-threshold could have been >> designed to be more flexible: >> >> hard-fail-threshold: When an operation fails this many ti

Re: [ClusterLabs] Node attributes

2016-05-19 Thread Ken Gaillot
On 05/18/2016 10:49 PM, ‪H Yavari‬ ‪ wrote: > Hi, > > How can I define a constraint for two resource based on one nodes > attribute? > > For example resource X and Y are co-located based on node attribute Z. > > > > Regards, > H.Yavari Hi, See

[ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-05-19 Thread Ken Gaillot
. And with this approach, it should be easier to set the variable for all actions on the resource (demote/stop/start/promote), rather than just the stop. I think the boolean approach fits all the envisioned use cases that have been discussed. Any objections to going that route instead of the count? -- Ken

Re: [ClusterLabs] Issue in resource constraints and fencing - RHEL7 - AWS EC2

2016-05-20 Thread Ken Gaillot
On 05/20/2016 10:02 AM, Pratip Ghosh wrote: > Hi All, > > I am implementing 2 node RedHat (RHEL 7.2) HA cluster on Amazon EC2 > instance. For floating IP I am using a shell script provided by AWS so > that virtual IP float to another instance if any one server failed with > health check. In basic

Re: [ClusterLabs] Resource seems to not obey constraint

2016-05-20 Thread Ken Gaillot
On 05/20/2016 10:29 AM, Leon Botes wrote: > I push the following config. > The iscsi-target fails as it tries to start on iscsiA-node1 > This is because I have no target installed on iscsiA-node1 which is by > design. All services listed here should only start on iscsiA-san1 > iscsiA-san2. > I am

Re: [ClusterLabs] pacemaker and fence_sanlock

2016-05-12 Thread Ken Gaillot
On 05/11/2016 09:14 PM, Da Shi Cao wrote: > Dear all, > > I'm just beginning to use pacemaker+corosync as our HA solution on > Linux, but I got stuck at the stage of configuring fencing. > > Pacemaker 1.1.15, Corosync Cluster Engine, version '2.3.5.46-d245', and > sanlock 3.3.0 (built May 10

Re: [ClusterLabs] Antw: Re: FR: send failcount to OCF RA start/stop actions

2016-05-12 Thread Ken Gaillot
On 05/12/2016 06:21 AM, Adam Spiers wrote: > Hi Ken, > > Firstly thanks a lot not just for working on this, but also for being > so proactive in discussing the details. A perfect example of > OpenStack's "Open Design" philosophy in action :-) > > Ken Gail

Re: [ClusterLabs] notify action asynchronous ?

2016-05-12 Thread Ken Gaillot
; >>> Le Wed, 4 May 2016 09:55:34 -0500, >>> Ken Gaillot <kgail...@redhat.com> a écrit : >> ... >>>> There would be no point in the pre-promote notify waiting for the >>>> attribute value to be retrievable, because the cluster isn't going to >>

Re: [ClusterLabs] Q: monitor and probe result codes and consequences

2016-05-12 Thread Ken Gaillot
On 05/12/2016 02:56 AM, Ulrich Windl wrote: > Hi! > > I have a question regarding an RA written by myself and pacemaker > 1.1.12-f47ea56 (SLES11 SP4): > > During "probe" all resources' "monitor" actions are executed (regardless of > any ordering constraints). Therefore my RA considers a

Re: [ClusterLabs] start a resource

2016-05-17 Thread Ken Gaillot
On 05/16/2016 12:22 PM, Dimitri Maziuk wrote: > On 05/13/2016 04:31 PM, Ken Gaillot wrote: > >> That is definitely not a properly functioning cluster. Something >> is going wrong at some level. > > Yeah, well... how do I find out what/where? What happens after &quo

Re: [ClusterLabs] Pacemaker with Zookeeper??

2016-05-17 Thread Ken Gaillot
On 05/17/2016 06:50 AM, Bogdan Dobrelya wrote: > On 05/17/2016 01:17 PM, Adam Spiers wrote: >> Bogdan Dobrelya wrote: >>> On 05/16/2016 09:23 AM, Jan Friesse wrote: > Hi, > > I have an idea: use Pacemaker with Zookeeper (instead of Corosync). Is > it

Re: [ClusterLabs] Antw: Re: Using different folder for /var/lib/pacemaker and usage of /dev/shm files

2016-05-17 Thread Ken Gaillot
500.0M329.4M170.6M 66% /dev/shm > > On another node the same is 115 MB. > > Anyways, I'll monitor the usage to know what size is needed. > > Thank you Ken and Ulrich. > > On Tue, May 17, 2016 at 8:23 PM, Ken Gaillot <kgail...@redhat.com > <

Re: [ClusterLabs] Using different folder for /var/lib/pacemaker and usage of /dev/shm files

2016-05-13 Thread Ken Gaillot
On 05/08/2016 11:19 PM, Nikhil Utane wrote: > Moving these questions to a different thread. > > Hi, > > We have limited storage capacity in our system for different folders. > How can I configure to use a different folder for /var/lib/pacemaker? ./configure

Re: [ClusterLabs] Antw: Re: Antw: Re: Q: monitor and probe result codes and consequences

2016-05-13 Thread Ken Gaillot
On 05/13/2016 06:00 AM, Ulrich Windl wrote: >>>> Dejan Muhamedagic <deja...@fastmail.fm> schrieb am 13.05.2016 um 12:16 in > Nachricht <20160513101626.GA12493@walrus.homenet>: >> Hi, >> >> On Fri, May 13, 2016 at 09:05:54AM +0200, Ulrich Windl w

Re: [ClusterLabs] start a resource

2016-05-13 Thread Ken Gaillot
On 05/06/2016 01:01 PM, Dimitri Maziuk wrote: > On 05/06/2016 12:05 PM, Ian wrote: >> Are you getting any other errors now that you've fixed the >> config? > > It's running now that I did the cluster stop/start, but no: I > wasn't getting any other errors. I did have a symlink resource >

Re: [ClusterLabs] Resource failure-timeout does not reset when resource fails to connect to both nodes

2016-05-13 Thread Ken Gaillot
On 03/28/2016 11:44 AM, Sam Gardner wrote: > I have a simple resource defined: > > [root@ha-d1 ~]# pcs resource show dmz1 > Resource: dmz1 (class=ocf provider=internal type=ip-address) > Attributes: address=172.16.10.192 monitor_link=true > Meta Attrs: migration-threshold=3

Re: [ClusterLabs] unable to start fence_scsi

2016-05-18 Thread Ken Gaillot
up003.ring0 >> May 18 10:37:03 apache-up001 crmd[15918]: notice: Initiating action 7: >> monitor scsia_monitor_0 on apache-up002.ring0 >> May 18 10:37:03 apache-up001 crmd[15918]: notice: Initiating action 4: >> monitor scsia_monitor_0 on apache-up001.ring0 (local) >> May 18

Re: [ClusterLabs] Unable to run Pacemaker: pcmk_child_exit

2016-05-05 Thread Ken Gaillot
e respawned, shutting the > cluster down. > May 05 16:15:20 [16294] airv_cu pacemakerd: notice: > pcmk_shutdown_worker: Shutting down Pacemaker > > The log and conf file is attached. > > -Regards > Nikhil > > On Thu, May 5, 2016 at 8:04 PM, Ken Gaill

Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-05-02 Thread Ken Gaillot
On 04/25/2016 07:28 AM, Lars Ellenberg wrote: > On Thu, Apr 21, 2016 at 12:50:43PM -0500, Ken Gaillot wrote: >> Hello everybody, >> >> The release cycle for 1.1.15 will be started soon (hopefully tomorrow)! >> >> The most prominent feature will be Klaus Wenninger

Re: [ClusterLabs] FR: send failcount to OCF RA start/stop actions

2016-05-04 Thread Ken Gaillot
On 05/04/2016 08:49 AM, Klaus Wenninger wrote: > On 05/04/2016 02:09 PM, Adam Spiers wrote: >> Hi all, >> >> As discussed with Ken and Andrew at the OpenStack summit last week, we >> would like Pacemaker to be extended to export the current failcount as >> an environment variable to OCF RA scripts

Re: [ClusterLabs] ringid interface FAULTY no resource move

2016-05-04 Thread Ken Gaillot
On 05/04/2016 07:14 AM, Rafał Sanocki wrote: > Hello, > I cant find what i did wrong. I have 2 node cluster, Corosync ,Pacemaker > , DRBD . When i plug out cable nothing happend. > > Corosync.conf > > # Please read the corosync.conf.5 manual page > totem { > version: 2 >

Re: [ClusterLabs] why and when a call of crm_attribute can be delayed ?

2016-05-04 Thread Ken Gaillot
On 04/25/2016 05:02 AM, Jehan-Guillaume de Rorthais wrote: > Hi all, > > I am facing a strange issue with attrd while doing some testing on a three > node > cluster with the pgsqlms RA [1]. > > pgsqld is my pgsqlms resource in the cluster. pgsql-ha is the master/slave > setup on top of pgsqld.

Re: [ClusterLabs] Running several instances of a Corosync/Pacemaker cluster on a node

2016-05-02 Thread Ken Gaillot
On 04/26/2016 03:33 AM, Bogdan Dobrelya wrote: > Is it possible to run several instances of a Corosync/Pacemaker clusters > on a node? Can a node be a member of several clusters, so they could put > resources there? I'm sure it's doable with separate nodes or containers, > but that's not the case.

Re: [ClusterLabs] Node is silently unfenced if transition is very long

2016-05-02 Thread Ken Gaillot
On 04/19/2016 10:47 AM, Vladislav Bogdanov wrote: > Hi, > > Just found an issue with node is silently unfenced. > > That is quite large setup (2 cluster nodes and 8 remote ones) with > a plenty of slowly starting resources (lustre filesystem). > > Fencing was initiated due to resource stop

Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-05-02 Thread Ken Gaillot
On 04/22/2016 05:55 PM, Adam Spiers wrote: > Ken Gaillot <kgail...@redhat.com> wrote: >> On 04/21/2016 06:09 PM, Adam Spiers wrote: >>> Ken Gaillot <kgail...@redhat.com> wrote: >>>> Hello everybody, >>>> >>>> The

Re: [ClusterLabs] Moving Related Servers

2016-04-19 Thread Ken Gaillot
first. Is there a technical reason App 3 can work only with App 1? Is it possible for service "X" to stay running on both App 3 and App 4 all the time? If so, this becomes easier. > > Sorry for heavy description. > > > --

<    1   2   3   4   5   6   7   8   9   10   >