Re: [ClusterLabs] Antw: Ordering constraint restart second resource group

2015-08-13 Thread Andrei Borzenkov
On Thu, Aug 13, 2015 at 11:25 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: And what exactly is your problem? Real life example. Database resource depends on storage resource(s). There are multiple filesystems/volumes with database files. Database admin needs to increase available

Re: [ClusterLabs] circumstances under which resources become unmanaged

2015-08-12 Thread Andrei Borzenkov
On 12.08.2015 20:46, N, Ravikiran wrote: Hi All, I have a resource added to pacemaker called 'cmsd' whose state is getting to 'unmanaged FAILED' state. Apart from manually changing the resource to unmanaged using pcs resource unmanage cmsd , I'm trying to understand under what all

Re: [ClusterLabs] Ordering constraint restart second resource group

2015-08-12 Thread Andrei Borzenkov
On 12.08.2015 19:35, John Gogu wrote: ​Hello, in my cluster configuration I have following situation: resource_group_A ip1 ip2 resource_group_B apache1 ordering constraint resource_group_A then resource_group_B symetrical=true When I add a new resource from group_A, resources

Re: [ClusterLabs] MySQL resource causes error 0_monitor_20000.

2015-08-18 Thread Andrei Borzenkov
On Tue, Aug 18, 2015 at 9:15 AM, Kiwamu Okabe kiw...@debian.or.jp wrote: Hi Andrei, On Tue, Aug 18, 2015 at 2:24 PM, Andrei Borzenkov arvidj...@gmail.com wrote: I made master-master replication on Pacemaker. But it causes error 0_monitor_2. It's not an error, it is just operation name

Re: [ClusterLabs] MySQL resource causes error 0_monitor_20000.

2015-08-17 Thread Andrei Borzenkov
Отправлено с iPhone 18 авг. 2015 г., в 7:19, Kiwamu Okabe kiw...@gmail.com написал(а): Hi all, I made master-master replication on Pacemaker. But it causes error 0_monitor_2. It's not an error, it is just operation name. If one of them boots Heartbeat and another doesn't, the

Re: [ClusterLabs] Antw: Re: MySQL resource causes error 0_monitor_20000.

2015-08-18 Thread Andrei Borzenkov
On Tue, Aug 18, 2015 at 3:34 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Kiwamu Okabe kiw...@debian.or.jp schrieb am 18.08.2015 um 11:48 in Nachricht CAEvX6dky8=_w6l2nhndfbowux+ol7ktaa44salru7a9-xed...@mail.gmail.com: Hi Andrei, On Tue, Aug 18, 2015 at 6:28 PM, Andrei

Re: [ClusterLabs] MySQL resource causes error 0_monitor_20000.

2015-08-18 Thread Andrei Borzenkov
On Tue, Aug 18, 2015 at 11:57 AM, Kiwamu Okabe kiw...@debian.or.jp wrote: Hi, On Tue, Aug 18, 2015 at 5:07 PM, Kiwamu Okabe kiw...@debian.or.jp wrote: ``` 2015-08-18 16:50:38 7081 [ERROR] Slave I/O: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids;

Re: [ClusterLabs] Pacemaker tries to demote resource that isn't running and returns OCF_FAILED_MASTER

2015-08-20 Thread Andrei Borzenkov
21.08.2015 00:35, Brian Campbell пишет: I have a master/slave resource (with a custom resource agent) which, if it uncleanly shut down, will return OCF_FAILED_MASTER on the next monitor operation. This seems to be what

Re: [ClusterLabs] Antw: Ordering constraint restart second resource group

2015-08-16 Thread Andrei Borzenkov
17.08.2015 02:26, Andrew Beekhof пишет: On 13 Aug 2015, at 7:33 pm, Andrei Borzenkov arvidj...@gmail.com wrote: On Thu, Aug 13, 2015 at 11:25 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: And what exactly is your problem? Real life example. Database resource depends on storage

Re: [ClusterLabs] CRM managing ADSL connection; failure not handled

2015-08-24 Thread Andrei Borzenkov
24.08.2015 12:35, Tom Yates пишет: I've got a failover firewall pair where the external interface is ADSL; that is, PPPoE. i've defined the service thus: primitive ExternalIP lsb:hb-adsl-helper \ op monitor interval=60s and in addition written a noddy script

Re: [ClusterLabs] CRM managing ADSL connection; failure not handled

2015-08-24 Thread Andrei Borzenkov
24.08.2015 13:32, Tom Yates пишет: On Mon, 24 Aug 2015, Andrei Borzenkov wrote: 24.08.2015 12:35, Tom Yates пишет: I've got a failover firewall pair where the external interface is ADSL; that is, PPPoE. i've defined the service thus: If stop operation failed resource state is undefined

Re: [ClusterLabs] systemd: xxxx.service start request repeated too quickly

2015-08-04 Thread Andrei Borzenkov
On Tue, Aug 4, 2015 at 4:57 PM, Juha Heinanen j...@tutpro.com wrote: Andrei Borzenkov writes: Not sure I really understand the question. If service cannot run anyway, you can simply remove it from configuration. You can set target state to stopped. You can unmanage it. It all depends on what

Re: [ClusterLabs] starting of resources

2015-08-11 Thread Andrei Borzenkov
On Tue, Aug 11, 2015 at 9:44 AM, Vijay Partha vijaysarath...@gmail.com wrote: Hi, Can we statically add resources to the nodes. I mean before the pacemaker is started can we add resources to the nodes like you dont require to make use of pcs resource create. Is this possible? You better

Re: [ClusterLabs] How to cluster a service with multiple possibilities

2015-07-25 Thread Andrei Borzenkov
В Fri, 24 Jul 2015 14:43:37 + David Gersic dger...@niu.edu пишет: I have a process (OpenSLP slpd) that I'd like to cluster. Unfortunately, this process provides multiple services, depending what it finds in its configuration file on startup. I need to have the process running on all of

Re: [ClusterLabs] Resource cannot run anywhere

2015-07-21 Thread Andrei Borzenkov
On Mon, Jul 20, 2015 at 4:40 PM, Leonhardt,Christian christian.leonha...@dako.de wrote: Hello everyone, I already posted this issue at the Debian HA maintainers list (http://l ists.alioth.debian.org/pipermail/debian-ha-maintainers/2015 -July/004325.html). Unfortunately the problem still exist

Re: [ClusterLabs] Resource location node preference

2015-10-22 Thread Andrei Borzenkov
22.10.2015 18:25, Vallevand, Mark K пишет: Suppose I have a resource defined with a preference of node1 over node2. The resource is running on node1. Node1 goes away. Now the resource is running on node2. Node1 comes back and joins the cluster. Will the resource relocate to Node1?

Re: [ClusterLabs] ORACLE 12 and SLES HAE (Sles 11sp3)

2015-10-28 Thread Andrei Borzenkov
On Wed, Oct 28, 2015 at 11:45 AM, Cristiano Coltro wrote: > Hi, > most of the SLES 11 sp3 with HAE are migrating Oracle Db. > The migration will be from Oracle 11 to Oracle 12 > > They have verified that the Oracles cluster resources actually supports > - Oracle 10.2 and 11.2

Re: [ClusterLabs] required nodes for quorum policy

2015-11-09 Thread Andrei Borzenkov
On Tue, Nov 10, 2015 at 1:20 AM, Radoslaw Garbacz wrote: > Hi, > > I have a question regarding the policy to check for cluster quorum for > corosync+pacemaker. > > As far as I know at present it is always (excpected_votes)/2 + 1. Seems like > "qdiskd" has an

Re: [ClusterLabs] restarting resources

2015-11-03 Thread Andrei Borzenkov
On Mon, Nov 2, 2015 at 7:59 PM, - - wrote: > Hi, >I need to be able to restart a resource (e.g apache) whenever a > configuration > file is updated. I have been using the 'crm resource restart ' command to to > it, > which does restart the resource BUT also restarts my

Re: [ClusterLabs] Antw: Monitoring Op for LVM - Excessive Logging

2015-10-09 Thread Andrei Borzenkov
09.10.2015 19:40, Jorge Fábregas пишет: On 10/09/2015 09:06 AM, Ulrich Windl wrote: Did you try daemon_options="-d0"? (in clvmd resource) I've just found this: http://pacemaker.oss.clusterlabs.narkive.com/C5BaFych/ocf-lvm2-clvmd-resource-agent ...so apparently SUSE changed the resource

Re: [ClusterLabs] 3 nodes cluster on Centos 7

2015-07-07 Thread Andrei Borzenkov
On Tue, Jul 7, 2015 at 12:34 PM, Michael Schwartzkopff m...@sys4.de wrote: The cluster has 3 nodes : - 1 virtual machine (machine1). This machine is supposed to be high-available - 2 physical machines identical (machine2 and 3) It's not going to work. If host where this VM is running

Re: [ClusterLabs] a newbie --question

2015-09-15 Thread Andrei Borzenkov
On Tue, Sep 15, 2015 at 4:38 PM, wrote: > Hi, > > Thanks for reply. > The problem is Compute resource, the appY and appZ can't run on same Server. > > It is possible ? > Yes; set location constraint that appY cannot run on the same node as appZ (and vice versa).

Re: [ClusterLabs] disable failover

2015-10-01 Thread Andrei Borzenkov
On Thu, Oct 1, 2015 at 5:30 PM, Vijay Partha wrote: > Hi, > > I want to know how to disable failover. If a node undergoes a failover the > resources running on the node should not be started on the other node in the > cluster. How can this be achieved. > What exactly

Re: [ClusterLabs] disable failover

2015-10-01 Thread Andrei Borzenkov
01.10.2015 19:09, Vijay Partha пишет: i want pacemaker to monitor the resources running on each node and at the same time restart it. It should run on the same node. Then create single node cluster. Why do you add second node if you do not want to use it? On Thu, Oct 1, 2015 at 9:17 PM,

Re: [ClusterLabs] Antw: Re: Antw: Re: design of a two-node cluster

2015-12-08 Thread Andrei Borzenkov
On Tue, Dec 8, 2015 at 12:01 PM, Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> wrote: >>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 08.12.2015 um 09:01 in > Nachricht > <CAA91j0Un+1EN6xRLM=dm6ck+usdzmpnyyjtha9d+btrzfcg...@mail.gmail.com>: >&g

Re: [ClusterLabs] master/slave resource agent without demote

2015-11-24 Thread Andrei Borzenkov
On Tue, Nov 24, 2015 at 5:19 PM, Waldemar Brodkorb wrote: > Hi, > > we are using a derivate of the Tomcat OCF script. > Our web application needs to be promoted (via a wget call). > But our application is not able to demote in a clean way, so > we need to stop and then

Re: [ClusterLabs] start service after filesystemressource

2015-11-20 Thread Andrei Borzenkov
20.11.2015 16:38, haseni...@gmx.de пишет: Hi, I want to start several services after the drbd ressource an the filessystem is avaiable. This is my current configuration: node $id="184548773" host-1 \ attributes standby="on" node $id="184548774" host-2 \ attributes

Re: [ClusterLabs] Recovering after split-brain

2016-06-21 Thread Andrei Borzenkov
21.06.2016 20:05, Dimitri Maziuk пишет: > On 06/21/2016 11:47 AM, Digimer wrote: > >> If you don't need to coordinate services/access, you don't need HA. >> >> If you do need to coordinate services/access, you need fencing. > > So what you're saying is we *cannot* run a pacemaker cluster without

Re: [ClusterLabs] restarting pacemakerd

2016-06-19 Thread Andrei Borzenkov
18.06.2016 22:04, Dmitri Maziuk пишет: > On 2016-06-18 05:15, Ferenc Wágner wrote: > ... >> On the other hand, one could argue that restarting failed services >> should be the default behavior of systemd (or any init system). Still, >> it is not. > > As an off-topic snide comment, I never

Re: [ClusterLabs] getting "Totem is unable to form a cluster" error

2016-04-08 Thread Andrei Borzenkov
08.04.2016 17:51, Jan Friesse пишет: >> On 04/08/16 13:01, Jan Friesse wrote: >> >> pacemaker 1.1.12-11.12 >> >> openais 1.1.4-5.24.5 >> >> corosync 1.4.7-0.23.5 >> >> >> >> Its a two node active/passive cluster and we just upgraded the >> SLES 11 >> >> SP 3 to SLES 11 SP 4(nothing else)

Re: [ClusterLabs] PCS, Corosync, Pacemaker, and Bind (Ken Gaillot)

2016-03-19 Thread Andrei Borzenkov
On Wed, Mar 16, 2016 at 9:35 PM, Mike Bernhardt wrote: > I guess I have to say "never mind!" I don't know what the problem was > yesterday, but it loads just fine today, even when the named config and the > virtual ip don't match! But for your edamacation, ifconfig does NOT

Re: [ClusterLabs] Pacemaker startup-fencing

2016-03-19 Thread Andrei Borzenkov
On Wed, Mar 16, 2016 at 4:18 PM, Lars Ellenberg wrote: > On Wed, Mar 16, 2016 at 01:47:52PM +0100, Ferenc Wágner wrote: >> >> And some more about fencing: >> >> >> >> 3. What's the difference in cluster behavior between >> >>- stonith-enabled=FALSE (9.3.2: how often

Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker not always selecting the right stonith device

2016-07-22 Thread Andrei Borzenkov
22.07.2016 09:52, Ulrich Windl пишет: > That could be. Should there be a node list to configure, or can't the agent > find out itself (for SBD)? > It apparently does it already gethosts) echo `sbd -d $sbd_device list | cut -f2 | sort | uniq` exit 0

Re: [ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit

2016-07-22 Thread Andrei Borzenkov
22.07.2016 17:43, Jason A Ramsey пишет: > Additionally (and this is just a failing on my part), I’m > unclear as to where the resource agent is fed the value for > “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters one > is permitted to supply with “pcs resource create…” > It is

Re: [ClusterLabs] Antw: Re: Antw: RES: Pacemaker and OCFS2 on stand alone mode

2016-07-12 Thread Andrei Borzenkov
11.07.2016 09:33, Ulrich Windl пишет: >>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 09.07.2016 um 10:17 in > Nachricht <5780b30a.3000...@gmail.com>: >> 08.07.2016 09:11, Ulrich Windl пишет: >>>>>> "Carlos Xavier"

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Pacemaker not always selecting the right stonith device

2016-07-25 Thread Andrei Borzenkov
On Mon, Jul 25, 2016 at 9:07 AM, Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> wrote: >>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 22.07.2016 um 17:14 in > Nachricht <4f17c57b-7458-2ec8-cd74-3daaf9c89...@gmail.com>: >> 22.07.2016 09:52, Ulrich

Re: [ClusterLabs] Active/Passive Cluster restarting resources on healthy node and DRBD issues

2016-07-22 Thread Andrei Borzenkov
23.07.2016 00:07, TEG AMJG пишет: ... > Master: kamailioetcclone > Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > notify=true > Resource: kamailioetc (class=ocf provider=linbit type=drbd) >Attributes: drbd_resource=kamailioetc >Operations: start interval=0s

Re: [ClusterLabs] Previous DC fenced prior to integration

2016-07-22 Thread Andrei Borzenkov
23.07.2016 01:37, Nate Clark пишет: > Hello, > > I am running pacemaker 1.1.13 with corosync and think I may have > encountered a start up timing issue on a two node cluster. I didn't > notice anything in the changelog for 14 or 15 that looked similar to > this or open bugs. > > The rough out

Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-21 Thread Andrei Borzenkov
22.07.2016 00:38, Klaus Wenninger пишет: > On 07/21/2016 06:40 PM, Andrei Borzenkov wrote: >> 19.07.2016 18:24, Klaus Wenninger пишет: >>> On 07/19/2016 04:17 PM, Ken Gaillot wrote: >>>> On 07/19/2016 09:00 AM, Andrei Borzenkov wrote: >>>>> On Tue

Re: [ClusterLabs] external/libvirt source code

2016-08-02 Thread Andrei Borzenkov
On Tue, Aug 2, 2016 at 4:58 PM, Maciej Kopczyński wrote: > Hello, > > Sorry if it is a trivial question, but I am facing a wall here. I am trying > to configure fencing on cluster running Hyper-V. I need to modify source > code for external/libvirt plugin, but I have no idea

Re: [ClusterLabs] Can Pacemaker monitor geographical separated servers

2016-08-10 Thread Andrei Borzenkov
On Tue, Aug 9, 2016 at 9:40 PM, bhargav M.P wrote: > Hi All, > I have deployment where we have two Linux servers that are geographically > separated and they are across different subnets . I want the server to work > in Active/Standby mode . I would like to use

Re: [ClusterLabs] Antw: Re: Pacemaker not always selecting the right stonith device

2016-07-21 Thread Andrei Borzenkov
21.07.2016 09:49, Ulrich Windl пишет: Ken Gaillot schrieb am 19.07.2016 um 16:17 in Nachricht > : > > [...] >> You're right -- if not told otherwise, Pacemaker will query the device >> for the target list. In this

Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-19 Thread Andrei Borzenkov
On Tue, Jul 19, 2016 at 4:52 PM, Ken Gaillot wrote: ... >> >> primitive p_ston_pg1 stonith:external/ipmi \ >> params hostname=pg1 ipaddr=10.148.128.35 userid=root >> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass" >> passwd_method=file interface=lan

Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-19 Thread Andrei Borzenkov
19.07.2016 18:24, Klaus Wenninger пишет: > On 07/19/2016 04:17 PM, Ken Gaillot wrote: >> On 07/19/2016 09:00 AM, Andrei Borzenkov wrote: >>> On Tue, Jul 19, 2016 at 4:52 PM, Ken Gaillot <kgail...@redhat.com> wrote: >>> ... >>>>> primitive p_ston_pg

Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-20 Thread Andrei Borzenkov
On Tue, Jul 19, 2016 at 6:33 PM, Martin Schlegel wrote: >> > [...] >> > >> > primitive p_ston_pg1 stonith:external/ipmi \ >> > params hostname=pg1 ipaddr=10.148.128.35 userid=root >> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass" >> >

Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node

2016-07-20 Thread Andrei Borzenkov
20.07.2016 18:08, Jason A Ramsey пишет: > I have been struggling getting a HA iSCSI Target cluster in place for > literally weeks. I cannot, for whatever reason, get pacemaker to create an > iSCSILogicalUnit resource. The error message that I’m seeing leads me to > believe that I’m missing

Re: [ClusterLabs] Antw: RES: Pacemaker and OCFS2 on stand alone mode

2016-07-09 Thread Andrei Borzenkov
08.07.2016 09:11, Ulrich Windl пишет: "Carlos Xavier" schrieb am 07.07.2016 um 18:57 in > Nachricht <00e901d1d870$ae418000$0ac48000$@com.br>: >> Tank you for the fast reply >> >>> >>> have you configured the stonith and drbd stonith handler? >>> >> >> Yes.

Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-04 Thread Andrei Borzenkov
05.08.2016 02:33, Digimer пишет: > On 04/08/16 07:21 PM, Dan Swartzendruber wrote: >> On 2016-08-04 19:03, Digimer wrote: >>> On 04/08/16 06:56 PM, Dan Swartzendruber wrote: I'm setting up an HA NFS server to serve up storage to a couple of vsphere hosts. I have a virtual IP, and it

Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-05 Thread Andrei Borzenkov
On Fri, Aug 5, 2016 at 7:08 AM, Digimer <li...@alteeve.ca> wrote: > On 04/08/16 11:44 PM, Andrei Borzenkov wrote: >> 05.08.2016 02:33, Digimer пишет: >>> On 04/08/16 07:21 PM, Dan Swartzendruber wrote: >>>> On 2016-08-04 19:03, Digimer wrote: >>>>

Re: [ClusterLabs] restart of one instance of a clone resource causes restart of dependent resources

2017-02-20 Thread Andrei Borzenkov
06.02.2017 13:02, Daniel пишет: > Hi All, > > I'm having issues with a ordering constraint with a clone resource in > pacemaker v1.1.14. > - I have a resourceA-clone (running on 2 nodes: node1 and node2). > - then I have 2 other resources: resourceB1 (allowed to run on node1 only) > and

Re: [ClusterLabs] Mysql slave did not start replication after failure, and read-only IP also remained active on the much outdated slave

2016-08-22 Thread Andrei Borzenkov
On Mon, Aug 22, 2016 at 12:18 PM, Attila Megyeri wrote: > Dear community, > > > > A few days ago we had an issue in our Mysql M/S replication cluster. > > We have a one R/W Master, and a one RO Slave setup. RO VIP is supposed to be > running on the slave if it is not

Re: [ClusterLabs] using IPMI for fencing - configuring IPMI with ipmitool - HELP

2017-02-28 Thread Andrei Borzenkov
28.02.2017 20:39, Lentes, Bernd пишет: > Hi, > > i have a HP server ML 350 G9 with an ILO4 card. The riloe stonith agent does > not work, i read in a book the recommendation to use the ipmi ressource agent > instead. > I'm trying to configure the respective ILO adapter with ipmitool. Why do

Re: [ClusterLabs] I've been working on a split-brain prevention strategy for 2-node clusters.

2016-10-09 Thread Andrei Borzenkov
10.10.2016 00:42, Eric Robinson пишет: > Digimer, thanks for your thoughts. Booth is one of the solutions I > looked at, but I don't like it because it is complex and difficult to > implement HA is complex. There is no way around it. > (and perhaps costly in terms of AWS services or something >

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-21 Thread Andrei Borzenkov
14.10.2016 10:39, Vladislav Bogdanov пишет: > > use of utilization (balanced strategy) has one caveat: resources are > not moved just because of utilization of one node is less, when nodes > have the same allocation score for the resource. So, after the > simultaneus outage of two nodes in a

Re: [ClusterLabs] Help crm_master with score 0

2016-10-21 Thread Andrei Borzenkov
20.10.2016 09:20, K Aravind пишет: > Hi all > Small doubt > Let's say there are two node cluster without fencing say node 1 , node 2 > Where node 1 = active > Node 2 = passive > Now if node 1 is down > So node 2 promote is called ..however if the score given is 0 via > crm_master -l

Re: [ClusterLabs] Can Bonding Cause a Broadcast Storm?

2016-11-15 Thread Andrei Borzenkov
16.11.2016 02:48, Eric Robinson пишет: > mode 1. No special switch configuration. spanning tree not enabled. I > have 100+ Linux servers, all of which use bonding. The network has > been stable for 10 years. No changes recently. However, this is the > second time that we have seen high latency and

Re: [ClusterLabs] Unexpected Resource movement after failover

2016-10-13 Thread Andrei Borzenkov
On Thu, Oct 13, 2016 at 4:59 PM, Nikhil Utane wrote: > Hi, > > I have 5 nodes and 4 resources configured. > I have configured constraint such that no two resources can be co-located. > I brought down a node (which happened to be DC). I was expecting the > resource on

Re: [ClusterLabs] Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-11 Thread Andrei Borzenkov
On Tue, Oct 11, 2016 at 9:18 AM, Ulrich Windl wrote: > > My point is this: For a resource that can only exclusively run on one node, > it's important that the other node is down before taking action. But for cLVM > and OCFS2 the resources can run concurrently

Re: [ClusterLabs] Error performing operation: Argument list too long

2016-12-06 Thread Andrei Borzenkov
06.12.2016 20:41, Jan Pokorný пишет: > On 06/12/16 09:44 -0600, Ken Gaillot wrote: >> On 12/05/2016 02:29 PM, Shane Lawrence wrote: >>> I'm experiencing a strange issue with pacemaker. It is unable to check >>> the status of a systemd resource. >>> >>> systemctl shows that the service crashed:

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-22 Thread Andrei Borzenkov
22.04.2017 11:31, Klaus Wenninger пишет: >>> I wonder how SBD fits into this discussion. It is marketed as stonith >>> agent, but it is based on committing suicide so relies on well-behaving >>> nodes. Which we by definition cannot trust to behave well, otherwise >>> we'd not need stonith in

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-22 Thread Andrei Borzenkov
18.04.2017 10:47, Ulrich Windl пишет: ... >> >> Now let me come back to quorum vs. stonith; >> >> Said simply; Quorum is a tool for when everything is working. Fencing is >> a tool for when things go wrong. > > I'd say: Quorum is the tool to decide who'll be alive and who's going to die, > and

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-22 Thread Andrei Borzenkov
22.04.2017 23:33, Dmitri Maziuk пишет: > On 4/22/2017 12:02 PM, Digimer wrote: > >> Having SBD properly configured is *massively* safer than no fencing at >> all. So for people where other fence methods are not available for >> whatever reason, SBD is the way to go. > > Now you're talking. IMO

Re: [ClusterLabs] Antw: Re: Antw: Re: 2-Node Cluster Pointless?

2017-04-24 Thread Andrei Borzenkov
24.04.2017 09:15, Ulrich Windl пишет: >>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 22.04.2017 um 09:05 in > Nachricht <ede2cdd3-7020-9f59-90ad-c3b4a0c9e...@gmail.com>: >> 18.04.2017 10:47, Ulrich Windl пишет: >> ... >>>> >>>

Re: [ClusterLabs] SAP HANA resource start problem

2017-05-14 Thread Andrei Borzenkov
12.05.2017 13:30, Muhammad Sharfuddin пишет: > is there a bug in SAP HANA resource ? crm_mon shows that cluster started > the resource and keep the HANA resource in slave state, while in actual > cluster doesn't start the resources, we found following events in the logs: > SAP HANA agent

Re: [ClusterLabs] Notifications on changes in clustered LVM

2017-06-19 Thread Andrei Borzenkov
20.06.2017 02:15, Digimer пишет: > On 19/06/17 06:59 PM, Ferenc Wágner wrote: >> Digimer writes: >> >>> So we have a tool that watches for changes to clvmd by running >>> pvscan/vgscan/lvscan, but this seems to be expensive and occassionally >>> cause trouble. >> >> What kind of

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-27 Thread Andrei Borzenkov
Отправлено с iPhone > 27 нояб. 2017 г., в 14:36, Ferenc Wágner <wf...@niif.hu> написал(а): > > Andrei Borzenkov <arvidj...@gmail.com> writes: > >> 25.11.2017 10:05, Andrei Borzenkov пишет: >> >>> In one of guides suggested procedure to simulate

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-28 Thread Andrei Borzenkov
28.11.2017 13:01, Jan Pokorný пишет: > On 27/11/17 17:43 +0300, Andrei Borzenkov wrote: >> Отправлено с iPhone >> >>> 27 нояб. 2017 г., в 14:36, Ferenc Wágner <wf...@niif.hu> написал(а): >>> >>> Andrei Borzenkov <arvidj...@gmail.com> wri

Re: [ClusterLabs] cluster with two ESX server

2017-11-28 Thread Andrei Borzenkov
28.11.2017 10:45, Ramann, Björn пишет: > hi@all, > > in my configuration, the 1st Node run on ESX1, the second run on ESX2. Now > I'm looking for a way to configure the cluster fence/stonith with two ESX > server - is this possible? if you have sgared storage, SBD may be an option. > > I try

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-26 Thread Andrei Borzenkov
25.11.2017 10:05, Andrei Borzenkov пишет: > In one of guides suggested procedure to simulate split brain was to kill > corosync process. It actually worked on one cluster, but on another > corosync process was restarted after being killed without cluster > noticing anything. Except a

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-26 Thread Andrei Borzenkov
22.11.2017 22:45, Klaus Wenninger пишет: >> >> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly >> just fenced by sapprod01p for sapprod01p >> Nov 22 16:04:56 sapprod01s pacemakerd[3137]: warning: The crmd >> process (3151) can no longer be respawned, >> Nov 22 16:04:56

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-22 Thread Andrei Borzenkov
22.11.2017 22:45, Klaus Wenninger пишет: > On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >> VM on VSphere using shared VMDK as SBD. During basic tests by killing >> corosync and forcing STONITH pace

Re: [ClusterLabs] cluster with two ESX server

2017-11-29 Thread Andrei Borzenkov
29.11.2017 20:14, Klaus Wenninger пишет: > On 11/28/2017 07:41 PM, Andrei Borzenkov wrote: >> 28.11.2017 10:45, Ramann, Björn пишет: >>> hi@all, >>> >>> in my configuration, the 1st Node run on ESX1, the second run on ESX2. Now >>> I'm looking for

Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-11-30 Thread Andrei Borzenkov
30.11.2017 16:11, Klaus Wenninger пишет: > On 11/30/2017 01:41 PM, Ulrich Windl wrote: >> >>>>> "Gao,Yan" <y...@suse.com> schrieb am 30.11.2017 um 11:48 in Nachricht >> <e71afccc-06e3-97dd-c66a-1b4bac550...@suse.com>: >>> On

[ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-22 Thread Andrei Borzenkov
SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with VM on VSphere using shared VMDK as SBD. During basic tests by killing corosync and forcing STONITH pacemaker was not started after reboot. In logs I see during boot Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were

[ClusterLabs] SBD stonith in 2 node cluster - how to make it prefer one side of cluster?

2017-11-24 Thread Andrei Borzenkov
Wrapping my head around how pcmk_delay_max works, my understanding is - on startup pacemaker always starts one instance of stonith/sbd; it probably randomly selects node for it. I suppose this initial start is delayed by random number within pcmk_delay_max. - when cluster is partitioned,

[ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-24 Thread Andrei Borzenkov
In one of guides suggested procedure to simulate split brain was to kill corosync process. It actually worked on one cluster, but on another corosync process was restarted after being killed without cluster noticing anything. Except after several attempts pacemaker died with stopping resources ...

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-30 Thread Andrei Borzenkov
On Thu, Nov 30, 2017 at 12:42 AM, Jan Pokorný <jpoko...@redhat.com> wrote: > On 29/11/17 22:00 +0100, Jan Pokorný wrote: >> On 28/11/17 22:35 +0300, Andrei Borzenkov wrote: >>> 28.11.2017 13:01, Jan Pokorný пишет: >>>> On 27/11/17 17:43 +0300, Andrei Borzen

Re: [ClusterLabs] questions about startup fencing

2017-11-30 Thread Andrei Borzenkov
On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillot wrote: > > The same scenario is why a single node can't have quorum at start-up in > a cluster with "two_node" set. Both nodes have to see each other at > least once before they can assume it's safe to do anything. > Unless we set

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-30 Thread Andrei Borzenkov
On Thu, Nov 30, 2017 at 1:48 PM, Gao,Yan <y...@suse.com> wrote: > On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >> >> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >> VM on VSphere using shared VMDK as SBD. During basic tests by killing >

Re: [ClusterLabs] questions about startup fencing

2017-11-30 Thread Andrei Borzenkov
On Thu, Nov 30, 2017 at 1:39 PM, Gao,Yan <y...@suse.com> wrote: > On 11/30/2017 09:14 AM, Andrei Borzenkov wrote: >> >> On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillot <kgail...@redhat.com> wrote: >>> >>> >>> The same scenario is why a sing

Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-12-01 Thread Andrei Borzenkov
01.12.2017 22:36, Gao,Yan пишет: > On 11/30/2017 06:48 PM, Andrei Borzenkov wrote: >> 30.11.2017 16:11, Klaus Wenninger пишет: >>> On 11/30/2017 01:41 PM, Ulrich Windl wrote: >>>> >>>>>>> "Gao,Yan" <y...@suse.com> schrieb am 30.

Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Andrei Borzenkov
04.12.2017 18:47, Tomas Jelinek пишет: > Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a): >> Tomas Jelinek writes: >> * how is it shutting down the cluster when issuing "pcs cluster stop --all"? >>> >>> First, it sends a request to each node to stop

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-04 Thread Andrei Borzenkov
04.12.2017 14:48, Gao,Yan пишет: > On 12/02/2017 07:19 PM, Andrei Borzenkov wrote: >> 30.11.2017 13:48, Gao,Yan пишет: >>> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >>>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >>>> VM o

Re: [ClusterLabs] Wrong sbd.service dependencies

2017-12-17 Thread Andrei Borzenkov
17.12.2017 15:20, Gao,Yan пишет: > On 2017/12/16 16:59, Andrei Borzenkov wrote: >> 04.12.2017 21:55, Andrei Borzenkov пишет: >> ... >>>>> >>>>> I tried it (on openSUSE Tumbleweed which is what I have at hand, it >>>>> has >>>&g

Re: [ClusterLabs] pacemaker pingd with ms drbd = double masters short time when disconnected networks.

2017-12-16 Thread Andrei Borzenkov
15.12.2017 14:08, Прокопов Павел пишет: ... >     stonith-enabled=false \ >     no-quorum-policy=ignore \ ... > > Why pp-pacemaker2 first become a master? It breaks drdb. > Because you told it to behave this way. You told your cluster that neither stonith nor quorum are required; so each node

[ClusterLabs] Wrong sbd.service dependencies (was: Re: pacemaker with sbd fails to start if node reboots too fast)

2017-12-16 Thread Andrei Borzenkov
04.12.2017 21:55, Andrei Borzenkov пишет: ... >>> >>> I tried it (on openSUSE Tumbleweed which is what I have at hand, it has >>> SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch >>> disk at all. >> It simply waits tha

Re: [ClusterLabs] Issue with DRBD + a systemd resource

2017-12-14 Thread Andrei Borzenkov
14.12.2017 19:25, Jan Pokorný пишет: > On 14/12/17 10:49 -0500, Julien Semaan wrote: >> Great success! >> >> Adding the following line to /usr/lib/systemd/system/pacemaker.service did >> it: >> After=dbus.service > > Note, this is not a proper way for overriding the systemd unit files, > which is

Re: [ClusterLabs] Issue with DRBD + a systemd resource

2017-12-13 Thread Andrei Borzenkov
Отправлено с iPhone > 13 дек. 2017 г., в 22:53, Julien Semaan написал(а): > > Hello, > > Its my first post on this mailing list so excuse any rookie mistake I may do > in this thread. > > We currently have clusters deployed using corosync/pacemaker that manage DRBD > +

Re: [ClusterLabs] Colocation rule with vip and ms master

2017-11-10 Thread Andrei Borzenkov
26.10.2017 21:15, Norberto Lopes пишет: > Hi everyone, > > Could someone give me a bit more in-depth explanation of the semantical > differences between the following: > > (assume postgresMS is a master/slave resource for postgresql) > (ignore for a moment that the first rule could put the vip

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Andrei Borzenkov
05.12.2017 12:59, Gao,Yan пишет: > On 12/04/2017 07:55 PM, Andrei Borzenkov wrote: >> 04.12.2017 14:48, Gao,Yan пишет: >>> On 12/02/2017 07:19 PM, Andrei Borzenkov wrote: >>>> 30.11.2017 13:48, Gao,Yan пишет: >>>>> On 11/22/2017 08:01 PM, Andrei Borz

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Andrei Borzenkov
05.12.2017 13:34, Gao,Yan пишет: > On 12/05/2017 08:57 AM, Dejan Muhamedagic wrote: >> On Mon, Dec 04, 2017 at 09:55:46PM +0300, Andrei Borzenkov wrote: >>> 04.12.2017 14:48, Gao,Yan пишет: >>>> On 12/02/2017 07:19 PM, Andrei Borzenkov wrote: >>>>> 30.

Re: [ClusterLabs] interesting blog on Pacemaker-related outage

2017-12-07 Thread Andrei Borzenkov
07.12.2017 15:13, Adam Spiers пишет: > https://gocardless.com/blog/incident-review-api-and-dashboard-outage-on-10th-october/ > > > It's a great write-up, although a little frustrating that it is still > not fully understood why a -inf colocation failed whereas a +inf > succeeded.  (I actually

Re: [ClusterLabs] Should pacemaker pursue its own and corosync's instant resurrection if either dies? (Was: Is corosync supposed to be restarted if it dies?)

2017-12-02 Thread Andrei Borzenkov
02.12.2017 16:30, Jan Pokorný пишет: > > In race-condition free situation, such a BindsTo-incurred stopping (or > at least scheduled to since 235?) of the service is then not a subject > of auto-restarting, from what I've observed, and documentation agrees: > > Restart= [...] When the death of

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-02 Thread Andrei Borzenkov
30.11.2017 13:48, Gao,Yan пишет: > On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >> VM on VSphere using shared VMDK as SBD. During basic tests by killing >> corosync and forcing STONITH pacemaker was not

Re: [ClusterLabs] Corosync quorum vs. pacemaker quorum confusion

2017-12-06 Thread Andrei Borzenkov
07.12.2017 00:28, Klaus Wenninger пишет: > On 12/06/2017 08:03 PM, Ken Gaillot wrote: >> On Sun, 2017-12-03 at 14:03 +0300, Andrei Borzenkov wrote: >>> I assumed that with corosync 2.x quorum is maintained by corosync and >>> pacemaker simply gets yes/no. Apparent

[ClusterLabs] VMware guest disk configuration for SBD

2017-10-21 Thread Andrei Borzenkov
I'm looking for pointers to documentation (or if possible support statements) for setting up pacemaker cluster across physical ESX hosts using SBD as STONITH agent. There are a lot of options how one may setup virtual disk in VMware, and I'm unsure which one to chose. Configuration is based on

Re: [ClusterLabs] Two-node cluster fencing

2018-05-13 Thread Andrei Borzenkov
12.05.2018 07:31, Confidential Company пишет: > Hi, > > This is my setup: > > 1. I have Two vMware-ESXI hosts with one virtual machine (RHEL 7.4) on each. > 2. On my physical machine, I have four vmnic --> vmnic 0,1 for uplink going > to switchA and switchB --> vmnic 2,3 for heartbeat corosync

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-16 Thread Andrei Borzenkov
16.05.2018 20:01, Casey & Gina пишет: >> On May 16, 2018, at 10:43 AM, Casey & Gina wrote: >> >> Thank you and Andrei for the advice... >> >>> the pcs alternative commands are: >>> >>> pcs stonith create vfencing external/vcenter \ >>> VI_SERVER=10.1.1.1

Re: [ClusterLabs] DLM fencing

2018-05-23 Thread Andrei Borzenkov
24.05.2018 02:57, Jason Gauthier пишет: > I'm fairly new to clustering under Linux. I've basically have one shared > storage resource right now, using dlm, and gfs2. > I'm using fibre channel and when both of my nodes are up (2 node cluster) > dlm and gfs2 seem to be operating perfectly. > If I

Re: [ClusterLabs] Questions about SBD behavior

2018-05-25 Thread Andrei Borzenkov
On Fri, May 25, 2018 at 10:08 AM, Klaus Wenninger wrote: > On 05/25/2018 07:31 AM, 井上 和徳 wrote: >> Hi, >> >> I am checking the watchdog function of SBD (without shared block-device). >> In a two-node cluster, if one cluster is stopped, watchdog is triggered on >> the

  1   2   3   4   5   6   7   >