Re: [ClusterLabs] Reusing resource set in multiple constraints

2019-08-03 Thread Andrei Borzenkov
29.07.2019 22:07, Ken Gaillot пишет: > On Sat, 2019-07-27 at 11:04 +0300, Andrei Borzenkov wrote: >> Is it possible to have single definition of resource set that is >> later >> references in order and location constraints? All syntax in >> documentation or crmsh pre

Re: [ClusterLabs] corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

2019-07-29 Thread Andrei Borzenkov
On Mon, Jul 29, 2019 at 9:52 AM Jan Friesse wrote: > > Andrei > > Andrei Borzenkov napsal(a): > > corosync.service sets StopWhenUnneded=yes which normally stops it when > > This was the case only for very limited time (v 3.0.1) and it's removed > now (v 3.0.2) because i

[ClusterLabs] Node reset on shutdown by SBD watchdog with corosync-qdevice

2019-07-28 Thread Andrei Borzenkov
In two node cluster + qnetd I consistently see the node that is being shut down last being reset during shutdown. I.e. - shutdown the first node - OK - shutdown the second node - reset As far as I understand what happens is - during shutdown pacemaker.service is stopped first. In above

[ClusterLabs] corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

2019-07-28 Thread Andrei Borzenkov
corosync.service sets StopWhenUnneded=yes which normally stops it when pacemaker is shut down. Unfortunately, corosync-qdevice.service declares Requires=corosync.service and corosync-qdevice.service itself is *not* stopped when pacemaker.service is stopped. Which means corosync.service remains

Re: [ClusterLabs] Reusing resource set in multiple constraints

2019-07-28 Thread Andrei Borzenkov
27.07.2019 11:04, Andrei Borzenkov пишет: > Is it possible to have single definition of resource set that is later > references in order and location constraints? All syntax in > documentation or crmsh presumes inline set definition in location or > order statement. > > In th

[ClusterLabs] Reusing resource set in multiple constraints

2019-07-27 Thread Andrei Borzenkov
Is it possible to have single definition of resource set that is later references in order and location constraints? All syntax in documentation or crmsh presumes inline set definition in location or order statement. In this particular case there will be set of filesystems that need to be

Re: [ClusterLabs] Feedback wanted: Node reaction to fabric fencing

2019-07-25 Thread Andrei Borzenkov
On Thu, Jul 25, 2019 at 3:20 AM Ondrej wrote: > > Is there any plan on getting this also into 1.1 branch? > If yes, then I would be for just introducing the configuration option in > 1.1.x with default to 'stop'. > +1 for back porting it from someone who just recently hit this (puzzling)

Re: [ClusterLabs] Antw: Interacting with Pacemaker from my code

2019-07-16 Thread Andrei Borzenkov
On Tue, Jul 16, 2019 at 11:01 AM Nishant Nakate wrote: > >> > >> > I will give you a quick overview of the system. There would be 3 nodes >> > configured in a cluster. One would act as a leader and others as >> > followers. Our system would be actively running on all the three nodes and >> >

Re: [ClusterLabs] Antw: Interacting with Pacemaker from my code

2019-07-16 Thread Andrei Borzenkov
On Tue, Jul 16, 2019 at 9:48 AM Nishant Nakate wrote: > > > On Tue, Jul 16, 2019 at 11:33 AM Ulrich Windl > wrote: >> >> >>> Nishant Nakate schrieb am 16.07.2019 um 05:37 >> >>> in >> Nachricht >> : >> > Hi All, >> > >> > I am new to this community and HA tools. Need some guidance on my

Re: [ClusterLabs] [EXTERNAL] Re: "node is unclean" leads to gratuitous reboot

2019-07-11 Thread Andrei Borzenkov
On Thu, Jul 11, 2019 at 12:58 PM Lars Ellenberg wrote: > > On Wed, Jul 10, 2019 at 06:15:56PM +, Michael Powell wrote: > > Thanks to you and Andrei for your responses. In our particular > > situation, we want to be able to operate with either node in > > stand-alone mode, or with both nodes

Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-07-10 Thread Andrei Borzenkov
On Wed, Jul 10, 2019 at 12:42 PM Jehan-Guillaume de Rorthais wrote: > > > > Jul 09 09:16:32 [2679] postgres1 lrmd:debug: > > > child_kill_helper: Kill pid 12735's group Jul 09 09:16:34 [2679] > > > postgres1 lrmd: warning: child_timeout_callback: > > > PGSQL_monitor_15000

Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-07-10 Thread Andrei Borzenkov
On Wed, Jul 10, 2019 at 12:42 PM Jehan-Guillaume de Rorthais wrote: > > > P.S. crm_resource is called by resource agent (pgsqlms). And it shows > > result of original resource probing which makes it confusing. At least > > it explains where these logs entries come from. > > Not sure tu understand

Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-07-09 Thread Andrei Borzenkov
09.07.2019 13:08, Danka Ivanović пишет: > Hi I didn't manage to start master with postgres, even if I increased start > timeout. I checked executable paths and start options. > When cluster is running with manually started master and slave started over > pacemaker, everything works ok. Today we

Re: [ClusterLabs] "node is unclean" leads to gratuitous reboot

2019-07-09 Thread Andrei Borzenkov
On Tue, Jul 9, 2019 at 3:54 PM Michael Powell < michael.pow...@harmonicinc.com> wrote: > I have a two-node cluster with a problem. If I start Corosync/Pacemaker > on one node, and then delay startup on the 2nd node (which is otherwise > up and running), the 2nd node will be rebooted very soon

Re: [ClusterLabs] Problems with master/slave failovers

2019-07-03 Thread Andrei Borzenkov
On Wed, Jul 3, 2019 at 12:59 AM Ken Gaillot wrote: > > On Mon, 2019-07-01 at 23:30 +, Harvey Shepherd wrote: > > > The "transition summary" is just a resource-by-resource list, not > > > the > > > order things will be done. The "executing cluster transition" > > > section > > > is the order

Re: [ClusterLabs] Problems with master/slave failovers

2019-07-01 Thread Andrei Borzenkov
02.07.2019 2:30, Harvey Shepherd пишет: >> The "transition summary" is just a resource-by-resource list, not the >> order things will be done. The "executing cluster transition" section >> is the order things are being done. > > Thanks Ken. I think that's where the problem is originating. If you

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-29 Thread Andrei Borzenkov
28.06.2019 9:45, Andrei Borzenkov пишет: > On Fri, Jun 28, 2019 at 7:24 AM Harvey Shepherd > wrote: >> >> Hi All, >> >> >> I'm running Pacemaker 2.0.2 on a two node cluster. It runs one master/slave >> resource (I'll refer to it as the king resour

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-29 Thread Andrei Borzenkov
29.06.2019 8:05, Harvey Shepherd пишет: > There is an ordering constraint - everything must be started after the king > resource. But even if this constraint didn't exist I don't see that it should > logically make any difference due to all the non-clone resources being > colocated with the

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-28 Thread Andrei Borzenkov
On Fri, Jun 28, 2019 at 7:24 AM Harvey Shepherd wrote: > > Hi All, > > > I'm running Pacemaker 2.0.2 on a two node cluster. It runs one master/slave > resource (I'll refer to it as the king resource) and about 20 other resources > which are a mixture of: > > > - resources that only run on the

Re: [ClusterLabs] EXTERNAL: Re: Pacemaker not reacting as I would expect when two resources fail at the same time

2019-06-08 Thread Andrei Borzenkov
08.06.2019 5:12, Harvey Shepherd пишет: > Thank you for your advice Ken. Sorry for the delayed reply - I was trying out > a few things and trying to capture extra info. The changes that you suggested > make sense, and I have incorporated them into my config. However, the > original issue

Re: [ClusterLabs] Antw: Re: Antw: Re: Q: ocf:pacemaker:NodeUtilization monitor

2019-06-03 Thread Andrei Borzenkov
03.06.2019 9:09, Ulrich Windl пишет: > 118 if [ ‑x $xentool ]; then > 119 $xentool info | awk >>> '/total_memory/{printf("%d\n",$3);exit(0)}' > 120 else > 121 ocf_log warn "Can only set hv_memory for Xen hypervisor" > 122 echo "0"

Re: [ClusterLabs] Antw: Re: Q: ocf:pacemaker:NodeUtilization monitor

2019-05-29 Thread Andrei Borzenkov
29.05.2019 11:12, Ulrich Windl пишет: Jan Pokorný schrieb am 28.05.2019 um 16:31 in > Nachricht > <20190528143145.ga29...@redhat.com>: >> On 27/05/19 08:28 +0200, Ulrich Windl wrote: >>> I copnfigured ocf:pacemaker:NodeUtilization more or less for fun, and I >> realized that the cluster

Re: [ClusterLabs] Antw: Re: Constant stop/start of resource in spite of interval=0

2019-05-21 Thread Andrei Borzenkov
21.05.2019 0:46, Ken Gaillot пишет: >> >>> From what's described here, the op-restart-digest is changing every >>> time, which means something is going wrong in the hash comparison >>> (since the definition is not really changing). >>> >>> The log that stands out to me is: >>> >>> trace May 18

Re: [ClusterLabs] Constant stop/start of resource in spite of interval=0

2019-05-18 Thread Andrei Borzenkov
18.05.2019 18:34, Kadlecsik József пишет: > Hello, > > We have a resource agent which creates IP tunnels. In spite of the > configuration setting > > primitive tunnel-eduroam ocf:local:tunnel \ > params > op start timeout=120s interval=0 \ > op stop timeout=300s

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-05-12 Thread Andrei Borzenkov
30.04.2019 9:53, Digimer пишет: > On 2019-04-30 12:07 a.m., Andrei Borzenkov wrote: >> As soon as majority of nodes are stopped, the remaining nodes are out of >> quorum and watchdog reboot kicks in. >> >> What is the correct procedure to ensure nodes are sto

Re: [ClusterLabs] monitor timed out with unknown error

2019-05-06 Thread Andrei Borzenkov
On Mon, May 6, 2019 at 8:30 AM Arkadiy Kulev wrote: > > Andrei, > > I just went through the docs > (https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-failure-migration.html) > and it says that the option "failure-timeout" is responsible for retrying a > failed

Re: [ClusterLabs] monitor timed out with unknown error

2019-05-05 Thread Andrei Borzenkov
> On Sun, May 5, 2019 at 11:05 PM Andrei Borzenkov > wrote: > >> 05.05.2019 18:43, Arkadiy Kulev пишет: >>> Dear Andrei, >>> >>> I'm sorry for the screenshot, this is the only thing that I have left >> after >>> the crash. >>> >&

Re: [ClusterLabs] monitor timed out with unknown error

2019-05-05 Thread Andrei Borzenkov
rerequisite was successful stop of resource. > Sincerely, > Ark. > > e...@ethaniel.com > > > On Sun, May 5, 2019 at 9:46 PM Andrei Borzenkov wrote: > >> 05.05.2019 16:14, Arkadiy Kulev пишет: >>> Hello! >>> >>> I run pacemaker on 2 active/activ

Re: [ClusterLabs] monitor timed out with unknown error

2019-05-05 Thread Andrei Borzenkov
05.05.2019 16:14, Arkadiy Kulev пишет: > Hello! > > I run pacemaker on 2 active/active hosts which balance the load of 2 public > IP addresses. > A few days ago we ran a very CPU/network intensive process on one of the 2 > hosts and Pacemaker failed. > > I've attached a screenshot of the

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-05-04 Thread Andrei Borzenkov
30.04.2019 19:47, Олег Самойлов пишет: > > >> 30 апр. 2019 г., в 19:38, Andrei Borzenkov >> написал(а): >> >> 30.04.2019 19:34, Олег Самойлов пишет: >>> >>>> No. I simply want reliable way to shutdown the whole cluster >>>> (for

[ClusterLabs] corosync-qdevice[3772]: Heuristics worker waitpid failed (10): No child processes

2019-05-04 Thread Andrei Borzenkov
While testing corosync-qdevice I repeatedly got the above message. The reason seems to be startup sequence in corosync-qdevice. Consider: ● corosync-qdevice.service - Corosync Qdevice daemon Loaded: loaded (/etc/systemd/system/corosync-qdevice.service; disabled; vendor preset: disabled)

Re: [ClusterLabs] crm_mon output to html-file - is there a way to manipulate the html-file ?

2019-05-03 Thread Andrei Borzenkov
03.05.2019 20:18, Lentes, Bernd пишет: > Hi, > > on my cluster nodes i established a systemd service which starts crm_mon > which writes cluster information into a html-file so i can see the state > of my cluster in a webbrowser. > crm_mon is started that way: > /usr/sbin/crm_mon -d -i 10 -h

Re: [ClusterLabs] Timeout stopping corosync-qdevice service

2019-04-30 Thread Andrei Borzenkov
30.04.2019 9:51, Jan Friesse пишет: > >> Now, corosync-qdevice gets SIGTERM as "signal to terminate", but it >> installs SIGTERM handler that does not exit and only closes some socket. >> May be this should trigger termination of main loop, but somehow it does >> not. > > Yep, this is exactly

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-04-30 Thread Andrei Borzenkov
30.04.2019 19:34, Олег Самойлов пишет: > >> No. I simply want reliable way to shutdown the whole cluster (for >> maintenance). > > Official way is `pcs cluster stop --all`. pcs is just one of multiple high level tools. I am interested in plumbing, not porcelain. > But it’s not always worked as

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-04-30 Thread Andrei Borzenkov
about dynamic cluster expansion; the question is about normal static cluster with fixed number of nodes that needs to be shut down. >> 30 апр. 2019 г., в 7:07, Andrei Borzenkov написал(а): >> >> As soon as majority of nodes are stopped, the remaining nodes are out of >> q

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-04-30 Thread Andrei Borzenkov
30.04.2019 9:53, Digimer пишет: > On 2019-04-30 12:07 a.m., Andrei Borzenkov wrote: >> As soon as majority of nodes are stopped, the remaining nodes are out of >> quorum and watchdog reboot kicks in. >> >> What is the correct procedure to ensure nodes are sto

[ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-04-29 Thread Andrei Borzenkov
As soon as majority of nodes are stopped, the remaining nodes are out of quorum and watchdog reboot kicks in. What is the correct procedure to ensure nodes are stopped in clean way? Short of disabling stonith-watchdog-timeout before stopping cluster ...

Re: [ClusterLabs] Timeout stopping corosync-qdevice service

2019-04-29 Thread Andrei Borzenkov
29.04.2019 14:32, Jan Friesse пишет: > Andrei, > >> I setup qdevice in openSUSE Tumbleweed and while it works as expected I > > Is it corosync-qdevice or corosync-qnetd daemon? > corosync-qdevice >> cannot stop it - it always results in timeout and service finally gets >> killed by systemd.

Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-04-29 Thread Andrei Borzenkov
29.04.2019 18:05, Ken Gaillot пишет: >> >>> Why does not it check OCF_RESKEY_CRM_meta_notify? >> >> I was just not aware of this env variable. Sadly, it is not >> documented >> anywhere :( > > It's not a Pacemaker-created value like the other notify variables -- > all user-specified

Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-04-27 Thread Andrei Borzenkov
27.04.2019 1:04, Danka Ivanović пишет: > Hi, here is a complete cluster configuration: > > node 1: master > node 2: secondary > primitive AWSVIP awsvip \ > params secondary_private_ip=10.x.x.x api_delay=5 > primitive PGSQL pgsqlms \ > params pgdata="/var/lib/postgresql/9.5/main" >

Re: [ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-20 Thread Andrei Borzenkov
20.04.2019 22:29, Lentes, Bernd пишет: > > > - Am 18. Apr 2019 um 16:21 schrieb kgaillot kgail...@redhat.com: > >> >> Simply stopping pacemaker and corosync by whatever mechanism your >> distribution uses (e.g. systemctl) should be sufficient. > > That works. But strangely is that after a

Re: [ClusterLabs] SBD as watchdog daemon

2019-04-14 Thread Andrei Borzenkov
12.04.2019 15:30, Олег Самойлов пишет: > >> 11 апр. 2019 г., в 20:00, Klaus Wenninger >> написал(а): >> >> On 4/11/19 5:27 PM, Олег Самойлов wrote: >>> Hi all. I am developing HA PostgreSQL cluster for 2 or 3 >>> datacenters. In case of DataCenter failure (blackout) the fencing >>> will not

Re: [ClusterLabs] How to reduce SBD watchdog timeout?

2019-04-07 Thread Andrei Borzenkov
03.04.2019 13:04, Klaus Wenninger пишет: > On 4/3/19 9:47 AM, Andrei Borzenkov wrote: >> On Tue, Apr 2, 2019 at 8:49 PM Digimer wrote: >>> It's worth noting that SBD fencing is "better than nothing", but slow. >>> IPMI and/or PDU fencing completes a lot fas

Re: [ClusterLabs] Antw: Re: Issue with DB2 HADR cluster

2019-04-03 Thread Andrei Borzenkov
On Wed, Apr 3, 2019 at 10:26 AM Valentin Vidic wrote: > > On Wed, Apr 03, 2019 at 09:13:58AM +0200, Ulrich Windl wrote: > > I'm surprised: Once sbd writes the fence command, it usually takes > > less than 3 seconds until the victim is dead. If you power off a > > server, the PDU still may have

Re: [ClusterLabs] Antw: Re: Issue with DB2 HADR cluster

2019-04-03 Thread Andrei Borzenkov
On Wed, Apr 3, 2019 at 10:14 AM Ulrich Windl wrote: > > >>> Digimer schrieb am 02.04.2019 um 19:49 in Nachricht > <6c6302f4-844b-240d-8d0e-727dddf36...@alteeve.ca>: > > [...] > > It's worth noting that SBD fencing is "better than nothing", but slow. > > IPMI and/or PDU fencing completes a lot

Re: [ClusterLabs] Issue with DB2 HADR cluster

2019-04-02 Thread Andrei Borzenkov
02.04.2019 19:32, Dileep V Nair пишет: > > > Hi, > > I have a two node DB2 Cluster with pacemaker and HADR. When I issue a > reboot -f on the node where Primary Database is running, I expect the > Standby database to be promoted as Primary. But what is happening is > pacemaker waits for

Re: [ClusterLabs] Colocation constraint moving resource

2019-03-26 Thread Andrei Borzenkov
26.03.2019 17:14, Ken Gaillot пишет: > On Tue, 2019-03-26 at 14:11 +0100, Thomas Singleton wrote: >> Dear all >> >> I am encountering an issue with colocation constraints. >> >> I have created a 4 nodes cluster (3 "main" and 1 "spare") with 3 >> resources and I wish to have each resource run only

Re: [ClusterLabs] Unable to restart resources

2019-03-26 Thread Andrei Borzenkov
26.03.2019 18:33, JCA пишет: > Making some progress with Pacemaker/DRBD, but still trying to grasp some of > the basics of this framework. Here is my current situation: > > I have a two-node cluster, pmk1 and pmk2, with resources ClusterIP and > DrbdFS. In what follows, commands preceded by

Re: [ClusterLabs] Apache graceful restart not supported by heartbeat apache control script

2019-03-25 Thread Andrei Borzenkov
25.03.2019 20:42, Cole Miller пишет: > Hi users@clusterlabs.org, > > My current project at work is a two node cluster running apache and > virtual IPs on CentOS 7. I found in my testing that apache when run > by corosync does not have a reload or graceful restart. Before the > cluster, when

Re: [ClusterLabs] recommendations for corosync totem timeout for CentOS 7 + VMware?

2019-03-22 Thread Andrei Borzenkov
On Fri, Mar 22, 2019 at 1:08 PM Jan Pokorný wrote: > > Also a Friday's idea: > Perhaps we should crank up "how to ask" manual for this list Yest another one? http://www.catb.org/~esr/faqs/smart-questions.html ___ Manage your subscription:

Re: [ClusterLabs] Interface confusion

2019-03-16 Thread Andrei Borzenkov
e stonith agent is not prohibited to run by (co-)location rules. My understanding is that this node is selected by DC in partition. > Thank you! > > sob., 16.03.2019, 05:37 użytkownik Andrei Borzenkov > napisał: > >> 16.03.2019 1:16, Adam Budziński пишет: >>> Hi Tomas, >>

Re: [ClusterLabs] Interface confusion

2019-03-15 Thread Andrei Borzenkov
16.03.2019 1:16, Adam Budziński пишет: > Hi Tomas, > > Ok but how then pacemaker or the fence agent knows which route to take to > reach the vCenter? They do not know or care at all. It is up to your underlying operating system and its routing tables. > Btw. Do I have to add the stonith

Re: [ClusterLabs] Two mode cluster VMware drbd

2019-03-12 Thread Andrei Borzenkov
12.03.2019 18:10, Adam Budziński пишет: > Hello, > > > > I’m planning to setup a two node (active-passive) HA cluster consisting of > pacemaker, corosync and DRBD. The two nodes will run on VMware VM’s and > connect to a single DB server (unfortunately for various reasons not > included in the

Re: [ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

2019-02-26 Thread Andrei Borzenkov
26.02.2019 18:05, Ken Gaillot пишет: > On Tue, 2019-02-26 at 06:55 +0300, Andrei Borzenkov wrote: >> 26.02.2019 1:08, Ken Gaillot пишет: >>> On Mon, 2019-02-25 at 23:00 +0300, Andrei Borzenkov wrote: >>>> 25.02.2019 22:36, Andrei Borzenkov пишет: >>>>&g

Re: [ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

2019-02-25 Thread Andrei Borzenkov
25.02.2019 23:13, Ken Gaillot пишет: > On Mon, 2019-02-25 at 14:20 +0530, Samarth Jain wrote: >> Hi, >> >> >> We have a bunch of resources running in master slave configuration >> with one master and one slave instance running at any given time. >> >> What we observe is, that for any two given

Re: [ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

2019-02-25 Thread Andrei Borzenkov
26.02.2019 1:08, Ken Gaillot пишет: > On Mon, 2019-02-25 at 23:00 +0300, Andrei Borzenkov wrote: >> 25.02.2019 22:36, Andrei Borzenkov пишет: >>> >>>> Could you please help me understand: >>>> 1. Why doesn't pacemaker process the failure of Stateful_T

Re: [ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

2019-02-25 Thread Andrei Borzenkov
25.02.2019 22:36, Andrei Borzenkov пишет: > >> Could you please help me understand: >> 1. Why doesn't pacemaker process the failure of Stateful_Test_2 resource >> immediately after first failure? > I'm still not sure why. > I vaguely remember something about seq

Re: [ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

2019-02-25 Thread Andrei Borzenkov
25.02.2019 11:50, Samarth Jain пишет: > Hi, > > > We have a bunch of resources running in master slave configuration with one > master and one slave instance running at any given time. > > What we observe is, that for any two given resources at a time, if say > resource Stateful_Test_1 is in

Re: [ClusterLabs] NFS4 share not working

2019-02-22 Thread Andrei Borzenkov
23.02.2019 2:57, solarflow99 пишет: > I'm trying to have my NFS share exported via pacemaker and now it doesn't > seem to be working, it also kills off nfs-mountd. It looks like the rbd > device could have something to do with it, the nfsroot doesn't get > exported, but there's no indication why:

Re: [ClusterLabs] Antw: Re: Why Do All The Services Go Down When Just One Fails?

2019-02-20 Thread Andrei Borzenkov
20.02.2019 21:51, Eric Robinson пишет: > > The following should show OK in a fixed font like Consolas, but the following > setup is supposed to be possible, and is even referenced in the ClusterLabs > documentation. > > > > > > +--+ > > | mysql001 +--+ > >

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-20 Thread Andrei Borzenkov
18.02.2019 18:53, Ken Gaillot пишет: > On Sun, 2019-02-17 at 20:33 +0300, Andrei Borzenkov wrote: >> 17.02.2019 0:33, Andrei Borzenkov пишет: >>> 17.02.2019 0:03, Eric Robinson пишет: >>>> Here are the relevant corosync logs. >>>> >>>> It

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Andrei Borzenkov
19.02.2019 23:06, Eric Robinson пишет: ... > Bottom line is, how do we configure the cluster in such a way that > there are no cascading circumstances when a MySQL resource fails? > Basically, if a MySQL resource fails, it fails. We'll deal with that > on an ad-hoc basis. I don't want the whole

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-17 Thread Andrei Borzenkov
17.02.2019 0:44, Eric Robinson пишет: > Thanks for the feedback, Andrei. > > I only want cluster failover to occur if the filesystem or drbd resources > fail, or if the cluster messaging layer detects a complete node failure. Is > there a way to tell PaceMaker not to trigger a cluster failover

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-17 Thread Andrei Borzenkov
17.02.2019 0:33, Andrei Borzenkov пишет: > 17.02.2019 0:03, Eric Robinson пишет: >> Here are the relevant corosync logs. >> >> It appears that the stop action for resource p_mysql_002 failed, and that >> caused a cascading series of service changes. However, I don'

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Andrei Borzenkov
t away from its current node. In this particular case it may be argued that pacemaker reaction is unjustified. Administrator explicitly set target state to "stop" (otherwise pacemaker would not attempt to stop it) so it is unclear why it tries to restart it on other node. >> -O

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Andrei Borzenkov
17.02.2019 0:03, Eric Robinson пишет: > Here are the relevant corosync logs. > > It appears that the stop action for resource p_mysql_002 failed, and that > caused a cascading series of service changes. However, I don't understand > why, since no other resources are dependent on p_mysql_002. >

Re: [ClusterLabs] Is fencing really a must for Postgres failover?

2019-02-13 Thread Andrei Borzenkov
13.02.2019 15:50, Maciej S пишет: > Can you describe at least one situation when it could happen? > I see situations where data on two masters can diverge but I can't find the > one where data gets corrupted. If diverged data in two databases that are supposed to be exact copy of each other is

Re: [ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

2019-01-24 Thread Andrei Borzenkov
24.01.2019 18:01, Lentes, Bernd пишет: > - On Jan 23, 2019, at 3:20 PM, Klaus Wenninger kwenn...@redhat.com wrote: >>> I have corosync-2.3.6-9.13.1.x86_64. >>> Where can i configure this value ? >> >> speaking of two_node & wait_for_all? >> That is configured in the quorum-section of

Re: [ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

2019-01-24 Thread Andrei Borzenkov
23.01.2019 17:20, Klaus Wenninger пишет: > > And yes dynamic-configuration of two_node should be possible - > remember that I had to implement that communication with > corosync into sbd for clusters that are expanded node-by-node > using pcs. > 'corosync-cfgtool -R' to reload the config. >

Re: [ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

2019-01-22 Thread Andrei Borzenkov
22.01.2019 20:00, Ken Gaillot пишет: > On Tue, 2019-01-22 at 16:52 +0100, Lentes, Bernd wrote: >> Hi, >> >> we have a new UPS which has enough charge to provide our 2-node >> cluster with the periphery (SAN, switches ...) for a resonable time. >> I'm currently thinking of the shutdown- and

Re: [ClusterLabs] Trying to Understanding crm-fence-peer.sh

2019-01-16 Thread Andrei Borzenkov
16.01.2019 19:49, Bryan K. Walton пишет: > On Wed, Jan 16, 2019 at 04:53:32PM +0100, Lars Ellenberg wrote: >> >> To clarify: crm-fence-peer.sh is an *example implementation* >> (even though an elaborate one) of a DRBD fencing policy handler, >> which uses pacemaker location constraints on the

Re: [ClusterLabs] 3 node cluster to 2 with quorum device

2019-01-05 Thread Andrei Borzenkov
06.01.2019 8:16, Jason Pfingstmann пишет: > I am new to corosync and pacemaker, having only used heartbeat in the > past (which is barely even comparable, now that I’m in the middle of > this). I’m working on a system for RDQM (IBM’s MQ software, > clustering solution) and it uses corosync with

Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue

2018-12-21 Thread Andrei Borzenkov
21.12.2018 12:09, Klaus Wenninger пишет: > On 12/21/2018 08:15 AM, Fulong Wang wrote: >> Hello Experts, >> >> I'm New to this mail lists. >> Pls kindlyforgive me if this mail has disturb you! >> >> Our Company recently is evaluating the usage of the SuSE HAE on x86 >> platform. >> Wen simulating

Re: [ClusterLabs] Configure a resource to only run a single instance at all times

2018-10-31 Thread Andrei Borzenkov
e that i can refer to? > Or will it be possible to configure the cluster to perform a probe in a given > interval? Will appreciate some guidance on this. Thanks > > > On Mon., Oct. 29, 2018, 13:20 Andrei Borzenkov, wrote: >> >> 29.10.2018 20:04, jm2109...@gmail.com

Re: [ClusterLabs] Configure a resource to only run a single instance at all times

2018-10-29 Thread Andrei Borzenkov
29.10.2018 20:04, jm2109...@gmail.com пишет: > Hi Guys, > > I'm a new user of pacemaker clustering software and I've just configured a > cluster with a single systemd resource. I have the following cluster and > resource configurations below. Failover works perfectly between the two > nodes

Re: [ClusterLabs] Floating IP active in both nodes

2018-10-26 Thread Andrei Borzenkov
26.10.2018 11:14, Gabriel Buades пишет: > Dear cluster labs team. > > I previously configured a two nodes cluster with replicated maria Db. > To use one database as the active, and the other one as failover, I > configured a cluster using heartbeat: > > root@logpmgid01v:~$ sudo crm configure

Re: [ClusterLabs] Need help to enable hot switch of iSCSI (tgtd) under two node Pacemaker + DRBD 9.0 under CentOS 7.5 in ESXi 6.5 Environment

2018-10-18 Thread Andrei Borzenkov
16.10.2018 15:29, LiFeng Zhang пишет: > Hi, all dear friends, > > i need your help to enable the hot switch of iSCSI under a > Pacemaker/Corosync Cluster, which has a iSCSI Device based on a two node > DRBD Replication. > > I've got the Pacemaker/Corosync cluster working, DRBD replication also >

Re: [ClusterLabs] Questions about pacemaker/ mysql resource agent behaviour when network fail

2018-10-10 Thread Andrei Borzenkov
10.10.2018 13:18, Simon Bomm пишет: > Le sam. 6 oct. 2018 à 06:13, Andrei Borzenkov a > écrit : > >> 05.10.2018 15:00, Simon Bomm пишет: >>> Hi all, >>> >>> Using pacemaker 1.1.18-11 and mysql resource agent ( >>> >> https://github.c

Re: [ClusterLabs] Questions about pacemaker/ mysql resource agent behaviour when network fail

2018-10-05 Thread Andrei Borzenkov
05.10.2018 15:00, Simon Bomm пишет: > Hi all, > > Using pacemaker 1.1.18-11 and mysql resource agent ( > https://github.com/ClusterLabs/resource-agents/blob/RHEL6/heartbeat/mysql), > I run into an unwanted behaviour. My point of view of course, maybe it's > expected to be as it is that's why I

Re: [ClusterLabs] Colocation by Node

2018-10-02 Thread Andrei Borzenkov
02.10.2018 23:49, Ken Gaillot пишет: ... Is a configuration like this possible? Without creating two primitives for 'ocf:esos:scst' and ditching the clone rule? Or is the >>> >>> No, there's no way to constrain against a particular clone >>> instance. >> >> Hmm ... >> >> commit

Re: [ClusterLabs] Colocation by Node

2018-10-02 Thread Andrei Borzenkov
01.10.2018 18:09, Marc Smith пишет: > Hi, > > I'm looking for the correct constraint setup to use for the following > resource configuration: > --snip-- > node 1: tgtnode2.parodyne.com > node 2: tgtnode1.parodyne.com > primitive p_iscsi_tgtnode1 iscsi \ > params portal=172.16.0.12

Re: [ClusterLabs] Colocation by Node

2018-10-02 Thread Andrei Borzenkov
01.10.2018 18:23, Ken Gaillot пишет: > On Mon, 2018-10-01 at 11:09 -0400, Marc Smith wrote: >> Hi, >> >> I'm looking for the correct constraint setup to use for the following >> resource configuration: >> --snip-- >> node 1: tgtnode2.parodyne.com >> node 2: tgtnode1.parodyne.com >> primitive

Re: [ClusterLabs] Colocation dependencies (dislikes)

2018-09-23 Thread Andrei Borzenkov
23.09.2018 13:07, Ian Underhill пишет: > Im trying to design a resource layout that has different "dislikes" > colocation scores between the various resources within the cluster. > > 1) When I start to have multiple colocation dependencies from a single > resource, strange behaviour starts to

Re: [ClusterLabs] Encrypted passwords for Resource Agent Scripts

2018-09-21 Thread Andrei Borzenkov
21.09.2018 16:31, Dileep V Nair пишет: > > > Hi, > > I have written heartbeat resource agent scripts for Oracle and > Sybase. Both the scripts take user passwords as parameters. Is there a way > to do some encryption for the passwords so that the plain text passwords > are not visible

Re: [ClusterLabs] is SFEX valid for Pacemaker on VMware with fence_vmware_soap?

2018-09-14 Thread Andrei Borzenkov
14.09.2018 05:38, Satoshi Suzuki пишет: > Hello, > pls let me ask if SFEX is valid as the disk exclusive access control for > Pacemaker clusters on VMware environment. > > My client is planning to configure Pacemaker HA clusters on several > VMware vSphere 6.5 hosts. > Each of the HA clusters

Re: [ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby

2018-09-10 Thread Andrei Borzenkov
07.09.2018 23:07, Dan Ragle пишет: > On an active-active two node cluster with DRBD, dlm, filesystem mounts, > a Web Server, and some crons I can't figure out how to have the crons > jump from node to node in the correct order. Specifically, I have two > crontabs (managed via symlink

Re: [ClusterLabs] Complex Pacemaker resource depedency

2018-09-10 Thread Andrei Borzenkov
10.09.2018 22:46, Vassilis Aretakis пишет: > Hi All, > > I have a Pacemaker which failovers IPs around 5 nodes. I want to add an > default Route on each node, when this node has at least one of the > resources running. > > > The resources are: > >  vip1-19    (ocf::heartbeat:IPaddr2):   

Re: [ClusterLabs] 2 node cluster dlm/clvm trouble

2018-09-06 Thread Andrei Borzenkov
06.09.2018 17:36, Patrick Whitney пишет: > Good Morning Everyone, > > I'm hoping someone with more experience with corosync and pacemaker can see > what I am doing wrong. > > I've got a test setup of 2 nodes, with dlm and clvm setup as clones, and > using fence_scsi as my fencing agent. > >

Re: [ClusterLabs] SAN, pacemaker, KVM: live-migration with ext3 ?

2018-09-05 Thread Andrei Borzenkov
05.09.2018 19:13, Lentes, Bernd пишет: > Hi guys, > > just to be sure. I thought (maybe i'm wrong) that having a VM on a shared > storage (FC SAN), e.g. in a raw file on an ext3 fs on that SAN allows > live-migration because pacemaker takes care that the ext3 fs is at any time > only mounted

Re: [ClusterLabs] Redundant ring not recovering after node is back

2018-08-22 Thread Andrei Borzenkov
22.08.2018 15:53, David Tolosa пишет: > Hello, > Im getting crazy about this problem, that I expect to resolve here, with > your help guys: > > I have 2 nodes with Corosync redundant ring feature. > > Each node has 2 similarly connected/configured NIC's. Both nodes are > connected each other by

Re: [ClusterLabs] Pacemaker ordering constraints and resource failures

2018-08-08 Thread Andrei Borzenkov
08.08.2018 16:59, Ken Gaillot пишет: > On Wed, 2018-08-08 at 07:36 +0300, Andrei Borzenkov wrote: >> 06.08.2018 20:07, Devin A. Bougie пишет: >>> What is the best way to make sure pacemaker doesn’t attempt to >>> recover or restart a resource if a resource it de

Re: [ClusterLabs] Pacemaker ordering constraints and resource failures

2018-08-07 Thread Andrei Borzenkov
08.08.2018 07:36, Andrei Borzenkov пишет: > 06.08.2018 20:07, Devin A. Bougie пишет: >> What is the best way to make sure pacemaker doesn’t attempt to recover or >> restart a resource if a resource it depends on is not started? >> >> For example, we have two dummy

Re: [ClusterLabs] Pacemaker ordering constraints and resource failures

2018-08-07 Thread Andrei Borzenkov
06.08.2018 20:07, Devin A. Bougie пишет: > What is the best way to make sure pacemaker doesn’t attempt to recover or > restart a resource if a resource it depends on is not started? > > For example, we have two dummy resources that simply sleep - master_sleep and > slave_sleep. We then have a

Re: [ClusterLabs] Antw: Re: Why Won't Resources Move?

2018-08-02 Thread Andrei Borzenkov
Отправлено с iPhone > 2 авг. 2018 г., в 9:27, Ulrich Windl > написал(а): > > Hi! > > I'm not familiar with Redhat, but is tis normal?: > >>> corosync: active/disabled >>> pacemaker: active/disabled > Some administrators prefer starting cluster stack manually, so it may be intentional.

Re: [ClusterLabs] Fence agent executing thousands of API calls per hour

2018-07-30 Thread Andrei Borzenkov
Отправлено с iPhone > 31 июля 2018 г., в 2:47, Casey & Gina написал(а): > > I've set up a number of clusters in a VMware environment, and am using the > fence_vmware_rest agent for fencing (from fence-agents 4.2.1), as follows: > > Stonith Devices: > Resource: vmware_fence (class=stonith

Re: [ClusterLabs] Antw: Trying to prevent instantaneous Failover / Failback at standby reconnect

2018-07-25 Thread Andrei Borzenkov
On Wed, Jul 25, 2018 at 8:54 AM, Ulrich Windl wrote: > Hi! > > You seem to have no STONITH device configured. Any you create a classical > split brain scenario. > > Another problem is this: "Another DC detected" > That is expected. We have no STONITH and ignore out-of-quorum situation (and

Re: [ClusterLabs] Trying to prevent instantaneous Failover / Failback at standby reconnect

2018-07-24 Thread Andrei Borzenkov
24.07.2018 20:59, O'Donovan, Garret пишет: > Hello, and thank you for adding me to the list. > > We are using Pacemaker in a two-node hot-warm redundancy configuration. Both > nodes run ocf:pacemaker:ping (cloned) to monitor a ping group of devices. > The nodes share a virtual IP using

Re: [ClusterLabs] Weird Fencing Behavior

2018-07-17 Thread Andrei Borzenkov
18.07.2018 04:21, Confidential Company пишет: >>> Hi, >>> >>> On my two-node active/passive setup, I configured fencing via >>> fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I >> expected >>> that both nodes will be stonithed simultaenously. >>> >>> On my test scenario, Node1 has

Re: [ClusterLabs] Weird Fencing Behavior?

2018-07-17 Thread Andrei Borzenkov
On Tue, Jul 17, 2018 at 10:58 AM, Confidential Company wrote: > Hi, > > On my two-node active/passive setup, I configured fencing via > fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I expected > that both nodes will be stonithed simultaenously. > > On my test scenario, Node1 has

Re: [ClusterLabs] Problem with pacemaker init.d script

2018-07-11 Thread Andrei Borzenkov
not sure if Ubuntu is >>> one). >>> when I run “make install” anything is created for systemd env. >> >> With Ubuntu 16, you should use "systemctl enable pacemaker" instead of >> update-rc.d. >> >> The pacemaker configure script should have detected

<    1   2   3   4   5   6   7   >