Re: [ClusterLabs] STONITH resources on wrong nodes

2018-07-11 Thread Andrei Borzenkov
host_list. It is totally unrelated to where it is active currently. > >> On 11 Jul 2018, at 19:10, Andrei Borzenkov wrote: >> >> 11.07.2018 19:44, Salvatore D'angelo пишет: >>> Hi all, >>> >>> in my cluster doing cam_mon -1ARrf I noticed my STONI

Re: [ClusterLabs] STONITH resources on wrong nodes

2018-07-11 Thread Andrei Borzenkov
11.07.2018 19:44, Salvatore D'angelo пишет: > Hi all, > > in my cluster doing cam_mon -1ARrf I noticed my STONITH resources are not > correctly located: Actual location of stonith resources does not really matter in up to date pacemaker. It only determines where resource will be monitored;

Re: [ClusterLabs] Problem with pacemaker init.d script

2018-07-11 Thread Andrei Borzenkov
11.07.2018 18:08, Salvatore D'angelo пишет: > Hi All, > > After I successfully upgraded Pacemaker from 1.1.14 to 1.1.18 and corosync > from 2.3.35 to 2.4.4 on Ubuntu 14.04 I am trying to repeat the same scenario > on Ubuntu 16.04. 16.04 is using systemd, you need to create systemd unit. I do

Re: [ClusterLabs] What triggers fencing?

2018-07-10 Thread Andrei Borzenkov
11.07.2018 05:45, Confidential Company пишет: > Not true, the faster node will kill the slower node first. It is > possible that through misconfiguration, both could die, but it's rare > and easily avoided with a 'delay="15"' set on the fence config for the > node you want to win. > > Don't use a

Re: [ClusterLabs] cronjobs only on active node

2018-07-10 Thread Andrei Borzenkov
On Tue, Jul 10, 2018 at 12:49 PM, Stefan K wrote: > Hello, > > is it somehow possible to have a cronjob active only on the active node? Define "active" node. ___ Users mailing list: Users@clusterlabs.org

Re: [ClusterLabs] Pacemaker not restarting Resource on same node

2018-06-28 Thread Andrei Borzenkov
28.06.2018 18:35, Dileep V Nair пишет: > > > Hi, > > I have a cluster with DB2 running in HADR mode. I have used the db2 > resource agent. My problem is whenever DB2 fails on primary it is migrating > to the secondary node. Ideally it should restart thrice (Migration > Threshold set to 3)

Re: [ClusterLabs] difference between external/ipmi and fence_ipmilan

2018-06-27 Thread Andrei Borzenkov
On Wed, Jun 27, 2018 at 12:30 PM, Stefan K wrote: > Hi Kristoffer, > > ok I see, but why maintain both? Let a coin decide which one, but I think its > not very helpful to maintain both. > Do you volunteer to implement automatic conversion to new agents including parameters adjustment? > best

Re: [ClusterLabs] VM failure during shutdown

2018-06-26 Thread Andrei Borzenkov
26.06.2018 19:36, Ken Gaillot пишет: > > One problem is that you are creating the VM, and then later adding > constraints about what the cluster can do with it. Therefore there is a > time in between where the cluster can start it without any constraint. > The solution is to make your changes all

Re: [ClusterLabs] corosync doesn't start any resource

2018-06-23 Thread Andrei Borzenkov
ds to deadlock. I am not intimately familiar with pcs, I assume this would be "pcs constraint order set ... symmetrical=false". > best regards > Stefan > >> Gesendet: Freitag, 22. Juni 2018 um 06:57 Uhr >> Von: "Andrei Borzenkov" >> An: users

Re: [ClusterLabs] corosync doesn't start any resource

2018-06-21 Thread Andrei Borzenkov
21.06.2018 16:04, Stefan Krueger пишет: > Hi Ken, > >> Can you attach the pe-input file listed just above here? > done ;) > > And thank you for your patience! > You delete all context which makes it hard to answer. This is not web forum where users can simply scroll up to see previous reply.

Re: [ClusterLabs] Fencing on 2-node cluster

2018-06-20 Thread Andrei Borzenkov
21.06.2018 01:12, Casey & Gina пишет: > Please forgive me, I had inadvertently had stonith-enabled=false when > I thought I had it true. The fencing/rebooting is now working. > However, in light of what you brought up earlier, how do I set a > delay preference different for one of the two hosts

Re: [ClusterLabs] Fencing on 2-node cluster

2018-06-20 Thread Andrei Borzenkov
21.06.2018 00:50, Digimer пишет: > On 2018-06-20 05:46 PM, Jehan-Guillaume de Rorthais wrote: >> On Wed, 20 Jun 2018 17:24:41 -0400 >> Digimer wrote: >> >>> Make sure quorum is disabled. Quorum doesn't work on 2-node clusters. >> >> It does with the "two_node" parameter enabled in

Re: [ClusterLabs] questions about fence_scsi

2018-06-15 Thread Andrei Borzenkov
On Fri, Jun 15, 2018 at 11:18 AM, Andrei Borzenkov wrote: > On Fri, Jun 15, 2018 at 10:14 AM, Stefan Krueger wrote: >> Hello, >> >> so far as I understand I can use fence_scsi on a two node cluster, if the >> fence running on one cluster the other cluster has no

Re: [ClusterLabs] questions about fence_scsi

2018-06-15 Thread Andrei Borzenkov
On Fri, Jun 15, 2018 at 10:14 AM, Stefan Krueger wrote: > Hello, > > so far as I understand I can use fence_scsi on a two node cluster, if the > fence running on one cluster the other cluster has no access to this devices, > correct? If I parse this sentence correctly - no, that's not correct

Re: [ClusterLabs] resource agent Route active on multiple nodes

2018-06-06 Thread Andrei Borzenkov
On Wed, Jun 6, 2018 at 12:28 PM, Florent Barra wrote: > Hi, > I want to create a simple cluster where only the second interface is > managed. > I create my resources in the following ways: > > pcs resource create ping-gateway ocf:pacemaker:ping name=ping-counter > host_list=10.22.5.254 --clone >

Re: [ClusterLabs] Pengine always trying to start the resource on the standby node.

2018-06-05 Thread Andrei Borzenkov
06.06.2018 04:27, Albert Weng пишет: > Hi All, > > I have created active/passive pacemaker cluster on RHEL 7. > > Here are my environment: > clustera : 192.168.11.1 (passive) > clusterb : 192.168.11.2 (master) > clustera-ilo4 : 192.168.11.10 > clusterb-ilo4 : 192.168.11.11 > > cluster resource

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-06-04 Thread Andrei Borzenkov
04.06.2018 18:53, Casey & Gina пишет: >> There are different code paths when RA is called automatically by >> resource manager and when RA is called manually by crm_resource. The >> latter did not export this environment variable until 1.1.17. So >> documentation is correct in that you do not need

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-06-01 Thread Andrei Borzenkov
On Fri, Jun 1, 2018 at 12:22 AM, Casey & Gina wrote: > >> pacemaker is too old. The error most likely comes from missing >> OCF_RESKEY_crm_feature_set which is exported by crm_resource starting >> with 1.1.17. I am not that familiar with debian packaging, but I'd >> expect resource-agents-paf

Re: [ClusterLabs] Why would a standby node be fenced? (was: How to set up fencing/stonith)

2018-05-31 Thread Andrei Borzenkov
31.05.2018 22:18, Jehan-Guillaume de Rorthais пишет: > Sorry for getting back to you so late. > > On Fri, 25 May 2018 11:58:59 -0600 > Casey & Gina wrote: > >>> On May 25, 2018, at 7:01 AM, Casey Allen Shobe >>> wrote: Actually, why is Pacemaker fencing the standby node just because a

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-31 Thread Andrei Borzenkov
31.05.2018 19:20, Casey & Gina пишет: >> There is no "master node" in pacemaker. There is master/slave >> resource so at the best it is "node on which specific resource has >> master role". And we have no way to know which on which node you >> resource had master role when you did it. Please be

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-30 Thread Andrei Borzenkov
31.05.2018 01:30, Casey & Gina пишет: >> In this case, the agent is returning "master (failed)", which does not >> mean that it previously failed when it was master -- it means it is >> currently running as master, in a failed condition. > > Well, it surely is NOT running. So the likely problem

Re: [ClusterLabs] Live migrate a VM in a cluster group

2018-05-29 Thread Andrei Borzenkov
28.05.2018 14:44, Jason Gauthier пишет: > On Mon, May 28, 2018 at 12:03 AM, Andrei Borzenkov > wrote: >> 28.05.2018 05:50, Jason Gauthier пишет: >>> Greetings, >>> >>> I've set up a cluster intended for VMs. I created a VM, and have >>> been

Re: [ClusterLabs] Questions about SBD behavior

2018-05-28 Thread Andrei Borzenkov
On Mon, May 28, 2018 at 10:47 AM, Klaus Wenninger <kwenn...@redhat.com> wrote: > On 05/28/2018 09:43 AM, Klaus Wenninger wrote: >> On 05/26/2018 07:23 AM, Andrei Borzenkov wrote: >>> 25.05.2018 14:44, Klaus Wenninger пишет: >>>> On 05/25/2018 12:44 PM, Andrei B

Re: [ClusterLabs] Live migrate a VM in a cluster group

2018-05-27 Thread Andrei Borzenkov
28.05.2018 05:50, Jason Gauthier пишет: > Greetings, > > I've set up a cluster intended for VMs. I created a VM, and have > been pretty pleased with migrating it back and forth between the two > nodes. Now, I am also using the VNC console, which requires listening > on port 59xx. Of course,

Re: [ClusterLabs] pacemaker selfsigned certificate - how to replace it

2018-05-26 Thread Andrei Borzenkov
24.05.2018 12:50, Duray Pascal пишет: > Dear, > > We are using pacemaker in order to configure a kvm cluster > > Our security has detected that we are using on servers an invalid certificate > (self signed) and has asked us to solve the problem > Can you please tell me how I can solve this

Re: [ClusterLabs] Questions about SBD behavior

2018-05-25 Thread Andrei Borzenkov
25.05.2018 14:44, Klaus Wenninger пишет: > On 05/25/2018 12:44 PM, Andrei Borzenkov wrote: >> On Fri, May 25, 2018 at 10:08 AM, Klaus Wenninger <kwenn...@redhat.com> >> wrote: >>> On 05/25/2018 07:31 AM, 井上 和徳 wrote: >>>> Hi, >>>> >>&

Re: [ClusterLabs] Questions about SBD behavior

2018-05-25 Thread Andrei Borzenkov
On Fri, May 25, 2018 at 10:08 AM, Klaus Wenninger wrote: > On 05/25/2018 07:31 AM, 井上 和徳 wrote: >> Hi, >> >> I am checking the watchdog function of SBD (without shared block-device). >> In a two-node cluster, if one cluster is stopped, watchdog is triggered on >> the

Re: [ClusterLabs] DLM fencing

2018-05-23 Thread Andrei Borzenkov
24.05.2018 02:57, Jason Gauthier пишет: > I'm fairly new to clustering under Linux. I've basically have one shared > storage resource right now, using dlm, and gfs2. > I'm using fibre channel and when both of my nodes are up (2 node cluster) > dlm and gfs2 seem to be operating perfectly. > If I

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-16 Thread Andrei Borzenkov
16.05.2018 23:33, Casey & Gina пишет: >> Is there an apt repository which provides more recent versions? > > I'm guessing no, based on trying fruitlessly to search for one. > >> Is there a way to use the version that Ubuntu provides (0.9.149) to >> accomplish the desired result? > > Barring

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-16 Thread Andrei Borzenkov
16.05.2018 19:43, Casey & Gina пишет: > Thank you and Andrei for the advice... > >> the pcs alternative commands are: >> >> pcs stonith create vfencing external/vcenter \ VI_SERVER=10.1.1.1 >> VI_CREDSTORE=/etc/vicredentials.xml \ >> HOSTLIST="hostname1=vmname1;hostname2=vmname2" RESETPOWERON=0

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-16 Thread Andrei Borzenkov
16.05.2018 20:01, Casey & Gina пишет: >> On May 16, 2018, at 10:43 AM, Casey & Gina wrote: >> >> Thank you and Andrei for the advice... >> >>> the pcs alternative commands are: >>> >>> pcs stonith create vfencing external/vcenter \ >>> VI_SERVER=10.1.1.1

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-15 Thread Andrei Borzenkov
16.05.2018 06:52, Casey & Gina пишет: > Hi, I'm trying to figure out how to get fencing/stonith going with > pacemaker. > > As far as I understand it, they are both part of the same thing - > setting up stonith means setting up fencing. If I'm mistaken on > that, please let me know. > They are

Re: [ClusterLabs] Two-node cluster fencing

2018-05-13 Thread Andrei Borzenkov
12.05.2018 07:31, Confidential Company пишет: > Hi, > > This is my setup: > > 1. I have Two vMware-ESXI hosts with one virtual machine (RHEL 7.4) on each. > 2. On my physical machine, I have four vmnic --> vmnic 0,1 for uplink going > to switchA and switchB --> vmnic 2,3 for heartbeat corosync

Re: [ClusterLabs] 答复: 答复: Could not start only one node in pacemaker

2018-05-02 Thread Andrei Borzenkov
02.05.2018 15:37, Jehan-Guillaume de Rorthais пишет: > On Wed, 2 May 2018 05:24:23 + > 范国腾 wrote: > >> Andrei, >> >> We use the following command to create the cluster: >> >> pcs cluster auth node1 node2 node3 node4 -u hacluster; >> pcs cluster setup --name

Re: [ClusterLabs] 答复: Could not start only one node in pacemaker

2018-05-01 Thread Andrei Borzenkov
cient number of) other nodes to appear, because doing anything else will clearly violate quorum requirement. Quorum relies on the fact that out-of-quorum nodes will not do anything. > Thanks > > > -邮件原件- 发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 > Andrei Borzenko

Re: [ClusterLabs] Could not start only one node in pacemaker

2018-05-01 Thread Andrei Borzenkov
02.05.2018 05:52, 范国腾 пишет: > Hi, > The cluster has three nodes: one is master and two are slave. Now we run “pcs > cluster stop --all” to stop all of the nodes. Then we run “pcs cluster start” > in the master node. We find it not able to started. The cause is that the > stonith resource could

Re: [ClusterLabs] Failure of preferred node in a 2 node cluster

2018-04-29 Thread Andrei Borzenkov
29.04.2018 04:19, Wei Shan пишет: > Hi, > > I'm using Redhat Cluster Suite 7with watchdog timer based fence agent. I > understand this is a really bad setup but this is what the end-user wants. > > ATB => auto_tie_breaker > > "When the auto_tie_breaker is used in even-number member clusters,

Re: [ClusterLabs] 答复: No slave is promoted to be master

2018-04-16 Thread Andrei Borzenkov
Отправлено с iPhone > 17 апр. 2018 г., в 7:16, 范国腾 написал(а): > > I check the status again. It is not not promoted but it promoted about 15 > minutes after the cluster starts. > > I try in three labs and the results are same: The promotion happens 15 > minutes

Re: [ClusterLabs] How can I prevent multiple start of IPaddr 2 in an environment using fence_mpath?

2018-04-05 Thread Andrei Borzenkov
06.04.2018 07:30, 飯田 雄介 пишет: > Hi, all > I am testing the environment using fence_mpath with the following settings. > > === > Stack: corosync > Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition with quorum > Last updated: Fri Apr 6 13:16:20 2018 > Last change: Thu Mar

Re: [ClusterLabs] How to cancel a fencing request?

2018-04-04 Thread Andrei Borzenkov
04.04.2018 01:35, Ken Gaillot пишет: > On Tue, 2018-04-03 at 21:46 +0200, Klaus Wenninger wrote: ... >> -inf constraints like that should effectively prevent stonith-actions from being executed on that nodes. >>> >>> It shouldn't ... >>> >>> Pacemaker respects

Re: [ClusterLabs] How to setup a simple master/slave cluster in two nodes without stonith resource

2018-04-02 Thread Andrei Borzenkov
03.04.2018 05:07, 范国腾 пишет: > Hello, > > I want to setup a cluster in two nodes. One is master and the other is slave. > I don’t need the fencing device because my internal network is stable. I use > the following command to create the resource, but all of the two nodes are > slave and

Re: [ClusterLabs] How to cancel a fencing request?

2018-04-01 Thread Andrei Borzenkov
31.03.2018 23:29, Jehan-Guillaume de Rorthais пишет: > Hi all, > > I experienced a problem in a two node cluster. It has one FA per node and > location constraints to avoid the node each of them are supposed to interrupt. > If you mean stonith resource - for all I know location it does not

Re: [ClusterLabs] Possible circular constraints

2018-03-31 Thread Andrei Borzenkov
31.03.2018 06:10, David Parker пишет: > Hello, > > I have a two-node cluster running MySQL with DRBD. I recently tried to > fail over from the first node (mysql1) to the second (mysql2) but it failed > because the promotion of DRBD on mysql2 failed. After the failure, I found > this in the

Re: [ClusterLabs] Antw: Re: crm shell 2.1.2 manual bug?

2018-03-28 Thread Andrei Borzenkov
28.03.2018 15:12, Ulrich Windl пишет: > > I was hoping "colocation ... ( B C D ) A" to be a shortcut of "colocation ... > B A", "colocation ... C A", "colocation ... D A"... > Well, my understanding is that "colocation { B C D } { A }" should do exactly that.

Re: [ClusterLabs] crm shell 2.1.2 manual bug?

2018-03-28 Thread Andrei Borzenkov
28.03.2018 13:25, Ulrich Windl пишет: > Hi! > > For crmsh-2.1.2+git132.gbc9fde0-18.2 I think there's a bug in the manual > describing resource sets: > >sequential >If true, the resources in the set do not depend on each other > internally. Setting sequential to true implies

[ClusterLabs] Displaying "original" resources location scores?

2018-03-27 Thread Andrei Borzenkov
"Original" for lack of better word. To explain - "crm_simulate -s" will show final scores as determined by possible colocation contraints. E.g. if B is colocated with A, scores for A will be adjusted by scores for B. In large or dynamic configuration with multiple constraints it may not be exactly

Re: [ClusterLabs] Colocation constraint for grouping all master-mode stateful resources with important stateless resources

2018-03-25 Thread Andrei Borzenkov
25.03.2018 10:21, Alberto Mijares пишет: > On Sat, Mar 24, 2018 at 2:16 PM, Andrei Borzenkov <arvidj...@gmail.com> wrote: >> 23.03.2018 20:42, Sam Gardner пишет: >>> Thanks, Ken. >>> >>> I just want all master-mode resources to be running wh

Re: [ClusterLabs] Colocation constraint for grouping all master-mode stateful resources with important stateless resources

2018-03-24 Thread Andrei Borzenkov
23.03.2018 20:42, Sam Gardner пишет: > Thanks, Ken. > > I just want all master-mode resources to be running wherever DRBDFS is > running (essentially). If the cluster detects that any of the master-mode > resources can't run on the current node (but can run on the other per > ethmon), all

Re: [ClusterLabs] Antw: Re: single node fails to start the ocfs2 resource

2018-03-14 Thread Andrei Borzenkov
On Wed, Mar 14, 2018 at 10:35 AM, Muhammad Sharfuddin wrote: > Hi Andrei, >>Somehow I miss corosync confiuration in this thread. Do you know >>wait-for-all is set (how?) or you just assume it? >> > solution found, I was not using "wait_for_all" option, I was assuming

Re: [ClusterLabs] FW: ocf_heartbeat_IPaddr2 - Real MAC of interface is revealed

2018-03-14 Thread Andrei Borzenkov
On Wed, Mar 14, 2018 at 12:40 AM, Andreas M. Iwanowski wrote: > Dear folks, > > We are currently trying to set up a multimaster cluster and use a cloned > ocf_heartbeat_IPaddr2 resource to share the IP address. > > We have, however, run into a problem that, when a cluster

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-09 Thread Andrei Borzenkov
09.03.2018 19:55, Muhammad Sharfuddin пишет: > Hi, > > This two node cluster starts resources when both nodes are online but > does not start the ocfs2 resources > > when one node is offline. e.g if I gracefully stop the cluster resources > then stop the pacemaker service on > > either node,

Re: [ClusterLabs] 答复: 答复: How to create the stonith resource in virtualbox

2018-02-26 Thread Andrei Borzenkov
On Mon, Feb 26, 2018 at 12:22 PM, 范国腾 wrote: > Hi Marek and all, > > > > I use the following command to create a stonith resource in > virtualbox(centos7)which has no /dev/mapper/fence: > > pcs stonith create scsi-stonith-device fence_scsi devices=/dev/mapper/fence >

Re: [ClusterLabs] 答复: 答复: How to configure to make each slave resource has one VIP

2018-02-24 Thread Andrei Borzenkov
25.02.2018 05:24, 范国腾 пишет: > Hello, > > If all of the slave nodes crash, all of the slave vips could not work. > > Do we have any way to make all of the slave VIPs binds to the master node if > there is no slave nodes in the system? > > the user client will not know the system has problem

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Andrei Borzenkov
til I fix it > manually. > Is there any way to modify the expiry time to clean itself?. > Yes, as mentioned set failure-timeout resource meta-attribute. > 22 de febrero de 2018 12:28, "Andrei Borzenkov" <arvidj...@gmail.com> > escribió: > >> Stonith resource sta

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Andrei Borzenkov
Stonith resource state should have no impact on actual stonith operation. It only reflects whether monitor was successful or not and serves as warning to administrator that something may be wrong. It should automatically clear itself after failure-timeout has expired. On Thu, Feb 22, 2018 at 1:58

Re: [ClusterLabs] How to create the stonith resource in virtualbox

2018-02-08 Thread Andrei Borzenkov
On Thu, Feb 8, 2018 at 5:51 AM, 范国腾 wrote: > Hello, > > I setup the pacemaker cluster using virtualbox. There are three nodes. The OS > is centos7, the /dev/sdb is the shared storage(three nodes use the same disk > file). > > (1) At first, I create the stonith using this

Re: [ClusterLabs] Two-Node Failover IP-Address and Gateway

2018-01-22 Thread Andrei Borzenkov
200.123.160/29 dev bad proto kernel scope link src 100.200.123.165 > 172.18.0.0/16 dev tun0 proto kernel scope link src 172.18.0.1 > 172.30.40.0/24 dev managed proto kernel scope link src 172.30.40.252 > root@fw-managed-02:~# ping 8.8.8.8 > PING 8.8.8.8 (8.8.8.8) 56(8

Re: [ClusterLabs] Two-Node Failover IP-Address and Gateway

2018-01-22 Thread Andrei Borzenkov
22.01.2018 20:54, brainheadz пишет: > Hello, > > I've got 2 public IP's and 2 Hosts. > > Each IP is assigned to one host. The interfaces are not configured by the > system, I am using pacemaker to do this. > > fw-managed-01: 100.200.123.166/29 > fw-managed-02: 100.200.123.165/29 > > gateway:

Re: [ClusterLabs] pengine bug? Recovery after monitor failure: Restart of DRBD does not restart Filesystem -- unless explicit order start before promote on DRBD

2018-01-13 Thread Andrei Borzenkov
12.01.2018 01:15, Lars Ellenberg пишет: > > To understand some weird behavior we observed, > I dumbed down a production config to three dummy resources, > while keeping some descriptive resource ids (ip, drbd, fs). > > For some reason, the constraints are: > stuff, more stuff, IP -> DRBD -> FS

Re: [ClusterLabs] Does anyone use clone instance constraints from pacemaker-next schema?

2018-01-11 Thread Andrei Borzenkov
11.01.2018 19:21, Ken Gaillot пишет: > On Thu, 2018-01-11 at 01:16 +0100, Jehan-Guillaume de Rorthais wrote: >> On Wed, 10 Jan 2018 12:23:59 -0600 >> Ken Gaillot wrote: >> ... >>> My question is: has anyone used or tested this, or is anyone >>> interested >>> in this? We

Re: [ClusterLabs] Antw: Re: Antw: Changes coming in Pacemaker 2.0.0

2018-01-11 Thread Andrei Borzenkov
On Thu, Jan 11, 2018 at 2:52 PM, Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> wrote: > > >>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 11.01.2018 um 12:41 in > Nachricht > <caa91j0waoqg46434gvvz_8yw5ve09fqhshqp-vqjo7uv6fe...@mail.gmail.com>: &

Re: [ClusterLabs] Antw: Changes coming in Pacemaker 2.0.0

2018-01-11 Thread Andrei Borzenkov
On Thu, Jan 11, 2018 at 10:54 AM, Ulrich Windl wrote: > Hi! > > On the tool changes, I'd prefer --move and --un-move as pair over --move and > --clear ("clear" is less expressive IMHO). --un-move is really wrong semantically. You do not "unmove" resource - you

Re: [ClusterLabs] crmsh resource failcount does not appear to work

2018-01-01 Thread Andrei Borzenkov
02.01.2018 06:48, Ken Gaillot пишет: > On Wed, 2017-12-27 at 14:03 +0300, Andrei Borzenkov wrote: >> On Wed, Dec 27, 2017 at 11:40 AM, Kristoffer Grönlund >> <deceive...@gmail.com> wrote: >>> >>> Andrei Borzenkov <arvidj...@gmail.com> writes:

Re: [ClusterLabs] Pacemaker Master restarts when Slave is added to the cluster

2017-12-27 Thread Andrei Borzenkov
Usual suspect - interleave=false on clone resource. On Wed, Dec 27, 2017 at 10:49 AM, 范国腾 wrote: > Hello, > > > > In my test environment, I meet one issue about the pacemaker: when a new > node is added in the cluster, the master node restart. This issue will lead > to the

Re: [ClusterLabs] crmsh resource failcount does not appear to work

2017-12-27 Thread Andrei Borzenkov
On Wed, Dec 27, 2017 at 11:40 AM, Kristoffer Grönlund <deceive...@gmail.com> wrote: > > Andrei Borzenkov <arvidj...@gmail.com> writes: > > > As far as I can tell, pacemaker acts on failcount attributes qualified > > by operation name, while crm sets/queries unqual

[ClusterLabs] crmsh resource failcount does not appear to work

2017-12-24 Thread Andrei Borzenkov
As far as I can tell, pacemaker acts on failcount attributes qualified by operation name, while crm sets/queries unqualified attribute; I do not see any syntax to set fail-count for specific operation in crmsh. ha1:~ # rpm -q crmsh crmsh-4.0.0+git.1511604050.816cb0f5-1.1.noarch ha1:~ # crm_mon

Re: [ClusterLabs] Wrong sbd.service dependencies

2017-12-17 Thread Andrei Borzenkov
17.12.2017 15:20, Gao,Yan пишет: > On 2017/12/16 16:59, Andrei Borzenkov wrote: >> 04.12.2017 21:55, Andrei Borzenkov пишет: >> ... >>>>> >>>>> I tried it (on openSUSE Tumbleweed which is what I have at hand, it >>>>> has >>>&g

[ClusterLabs] Wrong sbd.service dependencies (was: Re: pacemaker with sbd fails to start if node reboots too fast)

2017-12-16 Thread Andrei Borzenkov
04.12.2017 21:55, Andrei Borzenkov пишет: ... >>> >>> I tried it (on openSUSE Tumbleweed which is what I have at hand, it has >>> SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch >>> disk at all. >> It simply waits tha

Re: [ClusterLabs] pacemaker pingd with ms drbd = double masters short time when disconnected networks.

2017-12-16 Thread Andrei Borzenkov
15.12.2017 14:08, Прокопов Павел пишет: ... >     stonith-enabled=false \ >     no-quorum-policy=ignore \ ... > > Why pp-pacemaker2 first become a master? It breaks drdb. > Because you told it to behave this way. You told your cluster that neither stonith nor quorum are required; so each node

Re: [ClusterLabs] Issue with DRBD + a systemd resource

2017-12-14 Thread Andrei Borzenkov
14.12.2017 19:25, Jan Pokorný пишет: > On 14/12/17 10:49 -0500, Julien Semaan wrote: >> Great success! >> >> Adding the following line to /usr/lib/systemd/system/pacemaker.service did >> it: >> After=dbus.service > > Note, this is not a proper way for overriding the systemd unit files, > which is

Re: [ClusterLabs] Issue with DRBD + a systemd resource

2017-12-13 Thread Andrei Borzenkov
Отправлено с iPhone > 13 дек. 2017 г., в 22:53, Julien Semaan написал(а): > > Hello, > > Its my first post on this mailing list so excuse any rookie mistake I may do > in this thread. > > We currently have clusters deployed using corosync/pacemaker that manage DRBD > +

Re: [ClusterLabs] interesting blog on Pacemaker-related outage

2017-12-07 Thread Andrei Borzenkov
07.12.2017 15:13, Adam Spiers пишет: > https://gocardless.com/blog/incident-review-api-and-dashboard-outage-on-10th-october/ > > > It's a great write-up, although a little frustrating that it is still > not fully understood why a -inf colocation failed whereas a +inf > succeeded.  (I actually

Re: [ClusterLabs] Corosync quorum vs. pacemaker quorum confusion

2017-12-06 Thread Andrei Borzenkov
07.12.2017 00:28, Klaus Wenninger пишет: > On 12/06/2017 08:03 PM, Ken Gaillot wrote: >> On Sun, 2017-12-03 at 14:03 +0300, Andrei Borzenkov wrote: >>> I assumed that with corosync 2.x quorum is maintained by corosync and >>> pacemaker simply gets yes/no. Apparent

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Andrei Borzenkov
05.12.2017 13:34, Gao,Yan пишет: > On 12/05/2017 08:57 AM, Dejan Muhamedagic wrote: >> On Mon, Dec 04, 2017 at 09:55:46PM +0300, Andrei Borzenkov wrote: >>> 04.12.2017 14:48, Gao,Yan пишет: >>>> On 12/02/2017 07:19 PM, Andrei Borzenkov wrote: >>>>> 30.

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Andrei Borzenkov
05.12.2017 12:59, Gao,Yan пишет: > On 12/04/2017 07:55 PM, Andrei Borzenkov wrote: >> 04.12.2017 14:48, Gao,Yan пишет: >>> On 12/02/2017 07:19 PM, Andrei Borzenkov wrote: >>>> 30.11.2017 13:48, Gao,Yan пишет: >>>>> On 11/22/2017 08:01 PM, Andrei Borz

Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Andrei Borzenkov
04.12.2017 18:47, Tomas Jelinek пишет: > Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a): >> Tomas Jelinek writes: >> * how is it shutting down the cluster when issuing "pcs cluster stop --all"? >>> >>> First, it sends a request to each node to stop

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-04 Thread Andrei Borzenkov
04.12.2017 14:48, Gao,Yan пишет: > On 12/02/2017 07:19 PM, Andrei Borzenkov wrote: >> 30.11.2017 13:48, Gao,Yan пишет: >>> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >>>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >>>> VM o

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-02 Thread Andrei Borzenkov
30.11.2017 13:48, Gao,Yan пишет: > On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >> VM on VSphere using shared VMDK as SBD. During basic tests by killing >> corosync and forcing STONITH pacemaker was not

Re: [ClusterLabs] Should pacemaker pursue its own and corosync's instant resurrection if either dies? (Was: Is corosync supposed to be restarted if it dies?)

2017-12-02 Thread Andrei Borzenkov
02.12.2017 16:30, Jan Pokorný пишет: > > In race-condition free situation, such a BindsTo-incurred stopping (or > at least scheduled to since 235?) of the service is then not a subject > of auto-restarting, from what I've observed, and documentation agrees: > > Restart= [...] When the death of

Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-12-01 Thread Andrei Borzenkov
01.12.2017 22:36, Gao,Yan пишет: > On 11/30/2017 06:48 PM, Andrei Borzenkov wrote: >> 30.11.2017 16:11, Klaus Wenninger пишет: >>> On 11/30/2017 01:41 PM, Ulrich Windl wrote: >>>> >>>>>>> "Gao,Yan" <y...@suse.com> schrieb am 30.

Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-11-30 Thread Andrei Borzenkov
30.11.2017 16:11, Klaus Wenninger пишет: > On 11/30/2017 01:41 PM, Ulrich Windl wrote: >> >>>>> "Gao,Yan" <y...@suse.com> schrieb am 30.11.2017 um 11:48 in Nachricht >> <e71afccc-06e3-97dd-c66a-1b4bac550...@suse.com>: >>> On

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-30 Thread Andrei Borzenkov
On Thu, Nov 30, 2017 at 1:48 PM, Gao,Yan <y...@suse.com> wrote: > On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >> >> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >> VM on VSphere using shared VMDK as SBD. During basic tests by killing >

Re: [ClusterLabs] questions about startup fencing

2017-11-30 Thread Andrei Borzenkov
On Thu, Nov 30, 2017 at 1:39 PM, Gao,Yan <y...@suse.com> wrote: > On 11/30/2017 09:14 AM, Andrei Borzenkov wrote: >> >> On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillot <kgail...@redhat.com> wrote: >>> >>> >>> The same scenario is why a sing

Re: [ClusterLabs] questions about startup fencing

2017-11-30 Thread Andrei Borzenkov
On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillot wrote: > > The same scenario is why a single node can't have quorum at start-up in > a cluster with "two_node" set. Both nodes have to see each other at > least once before they can assume it's safe to do anything. > Unless we set

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-30 Thread Andrei Borzenkov
On Thu, Nov 30, 2017 at 12:42 AM, Jan Pokorný <jpoko...@redhat.com> wrote: > On 29/11/17 22:00 +0100, Jan Pokorný wrote: >> On 28/11/17 22:35 +0300, Andrei Borzenkov wrote: >>> 28.11.2017 13:01, Jan Pokorný пишет: >>>> On 27/11/17 17:43 +0300, Andrei Borzen

Re: [ClusterLabs] cluster with two ESX server

2017-11-29 Thread Andrei Borzenkov
29.11.2017 20:14, Klaus Wenninger пишет: > On 11/28/2017 07:41 PM, Andrei Borzenkov wrote: >> 28.11.2017 10:45, Ramann, Björn пишет: >>> hi@all, >>> >>> in my configuration, the 1st Node run on ESX1, the second run on ESX2. Now >>> I'm looking for

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-28 Thread Andrei Borzenkov
28.11.2017 13:01, Jan Pokorný пишет: > On 27/11/17 17:43 +0300, Andrei Borzenkov wrote: >> Отправлено с iPhone >> >>> 27 нояб. 2017 г., в 14:36, Ferenc Wágner <wf...@niif.hu> написал(а): >>> >>> Andrei Borzenkov <arvidj...@gmail.com> wri

Re: [ClusterLabs] cluster with two ESX server

2017-11-28 Thread Andrei Borzenkov
28.11.2017 10:45, Ramann, Björn пишет: > hi@all, > > in my configuration, the 1st Node run on ESX1, the second run on ESX2. Now > I'm looking for a way to configure the cluster fence/stonith with two ESX > server - is this possible? if you have sgared storage, SBD may be an option. > > I try

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-27 Thread Andrei Borzenkov
Отправлено с iPhone > 27 нояб. 2017 г., в 14:36, Ferenc Wágner <wf...@niif.hu> написал(а): > > Andrei Borzenkov <arvidj...@gmail.com> writes: > >> 25.11.2017 10:05, Andrei Borzenkov пишет: >> >>> In one of guides suggested procedure to simulate

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-26 Thread Andrei Borzenkov
22.11.2017 22:45, Klaus Wenninger пишет: >> >> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly >> just fenced by sapprod01p for sapprod01p >> Nov 22 16:04:56 sapprod01s pacemakerd[3137]: warning: The crmd >> process (3151) can no longer be respawned, >> Nov 22 16:04:56

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-26 Thread Andrei Borzenkov
25.11.2017 10:05, Andrei Borzenkov пишет: > In one of guides suggested procedure to simulate split brain was to kill > corosync process. It actually worked on one cluster, but on another > corosync process was restarted after being killed without cluster > noticing anything. Except a

[ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-24 Thread Andrei Borzenkov
In one of guides suggested procedure to simulate split brain was to kill corosync process. It actually worked on one cluster, but on another corosync process was restarted after being killed without cluster noticing anything. Except after several attempts pacemaker died with stopping resources ...

[ClusterLabs] SBD stonith in 2 node cluster - how to make it prefer one side of cluster?

2017-11-24 Thread Andrei Borzenkov
Wrapping my head around how pcmk_delay_max works, my understanding is - on startup pacemaker always starts one instance of stonith/sbd; it probably randomly selects node for it. I suppose this initial start is delayed by random number within pcmk_delay_max. - when cluster is partitioned,

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-22 Thread Andrei Borzenkov
22.11.2017 22:45, Klaus Wenninger пишет: > On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >> VM on VSphere using shared VMDK as SBD. During basic tests by killing >> corosync and forcing STONITH pace

[ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-22 Thread Andrei Borzenkov
SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with VM on VSphere using shared VMDK as SBD. During basic tests by killing corosync and forcing STONITH pacemaker was not started after reboot. In logs I see during boot Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were

Re: [ClusterLabs] Colocation rule with vip and ms master

2017-11-10 Thread Andrei Borzenkov
26.10.2017 21:15, Norberto Lopes пишет: > Hi everyone, > > Could someone give me a bit more in-depth explanation of the semantical > differences between the following: > > (assume postgresMS is a master/slave resource for postgresql) > (ignore for a moment that the first rule could put the vip

[ClusterLabs] VMware guest disk configuration for SBD

2017-10-21 Thread Andrei Borzenkov
I'm looking for pointers to documentation (or if possible support statements) for setting up pacemaker cluster across physical ESX hosts using SBD as STONITH agent. There are a lot of options how one may setup virtual disk in VMware, and I'm unsure which one to chose. Configuration is based on

Re: [ClusterLabs] Notifications on changes in clustered LVM

2017-06-19 Thread Andrei Borzenkov
20.06.2017 02:15, Digimer пишет: > On 19/06/17 06:59 PM, Ferenc Wágner wrote: >> Digimer writes: >> >>> So we have a tool that watches for changes to clvmd by running >>> pvscan/vgscan/lvscan, but this seems to be expensive and occassionally >>> cause trouble. >> >> What kind of

Re: [ClusterLabs] SAP HANA resource start problem

2017-05-14 Thread Andrei Borzenkov
12.05.2017 13:30, Muhammad Sharfuddin пишет: > is there a bug in SAP HANA resource ? crm_mon shows that cluster started > the resource and keep the HANA resource in slave state, while in actual > cluster doesn't start the resources, we found following events in the logs: > SAP HANA agent

Re: [ClusterLabs] Antw: Re: Antw: Re: 2-Node Cluster Pointless?

2017-04-24 Thread Andrei Borzenkov
24.04.2017 09:15, Ulrich Windl пишет: >>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 22.04.2017 um 09:05 in > Nachricht <ede2cdd3-7020-9f59-90ad-c3b4a0c9e...@gmail.com>: >> 18.04.2017 10:47, Ulrich Windl пишет: >> ... >>>> >>>

<    1   2   3   4   5   6   7   >