Re: [ClusterLabs] After Startup, Pacemaker Gasps and Dies

2016-07-25 Thread Eric Robinson
aned files. -- Eric Robinson -Original Message- From: Ken Gaillot [mailto:kgail...@redhat.com] Sent: Monday, July 25, 2016 7:52 AM To: users@clusterlabs.org Cc: Eric Robinson <eric.robin...@psmnv.com> Subject: Re: [ClusterLabs] After Startup, Pacemaker Gasps and Dies On 07/23/201

[ClusterLabs] subscribe

2016-07-23 Thread Eric Robinson
-- Eric Robinson Chief Information Officer Physician Select Management, LLC 775.885.2211 x 112 ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http

[ClusterLabs] subscribe

2016-07-23 Thread Eric Robinson
___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: Antw: Colocations and Orders Syntax Changed?

2017-01-31 Thread Eric Robinson
Indeed. My mistake. -- Eric Robinson -Original Message- From: Ulrich Windl [mailto:ulrich.wi...@rz.uni-regensburg.de] Sent: Friday, January 20, 2017 4:25 AM To: users@clusterlabs.org Subject: [ClusterLabs] Antw: Re: Antw: Colocations and Orders Syntax Changed? >>> Eric

Re: [ClusterLabs] Antw: Colocations and Orders Syntax Changed?

2017-01-20 Thread Eric Robinson
Thanks for the input. I usually just do a 'crm config show > myfile.xml.date_time' and the read it back in if I need to. -- Eric Robinson > -Original Message- > From: Ulrich Windl [mailto:ulrich.wi...@rz.uni-regensburg.de] > Sent: Thursday, January 19, 2017 12:04 AM

[ClusterLabs] Colocations and Orders Syntax Changed?

2017-01-18 Thread Eric Robinson
o they can each be started and stopped without hurting anything. -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clust

Re: [ClusterLabs] I've been working on a split-brain prevention strategy for 2-node clusters.

2016-10-09 Thread Eric Robinson
and refuses to join the cluster, notifying operators. Later, operators manually resolve the split brain. There is no perfect solution, of course, but is seems to me that this simple approach provides a level of availability beyond what you would normally get with a 2-node cluster. What am I missin

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Eric Robinson
I'm mostly interested in prventing false-positive cluster failovers that might occur during manual network maintenance (for example, testing switch and link outages). >> Thanks for the clarification. So what's the easiest way to ensure that the >> cluster waits a >> desired timeout before

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Eric Robinson
Basically, when we turn off a switch, I want to keep the cluster from failing over before Linux bonding has had a chance to recover. I'm mostly interested in prventing false-positive cluster failovers that might occur during manual network maintenance (for example, testing switch and link

[ClusterLabs] I've been working on a split-brain prevention strategy for 2-node clusters.

2016-10-09 Thread Eric Robinson
but I have not seen this approach anywhere. Maybe there's a good reason for that because it simply won't work? The arbitration solutions I have seen all rely on a third machine that plays a complex role in arbitration. Thoughts? -- Eric Robinson __

[ClusterLabs] Establishing Timeouts

2016-10-09 Thread Eric Robinson
failovers. In other words, if there is a link or switch failure, I want to make sure that the cluster allows plenty of time for link communication to recover before deciding that a node has actually died. -- Eric Robinson ___ Users mailing l

[ClusterLabs] Easy Linux Bonding Question?

2016-10-10 Thread Eric Robinson
control the delay with arp_ip_target? -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc

[ClusterLabs] Trying this question again re: arp_interval

2016-10-14 Thread Eric Robinson
Does anyone know how many arp_intervals must pass without a reply before the bonding driver downs the primary NIC? Just one? -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home

[ClusterLabs] Can Bonding Cause a Broadcast Storm?

2016-11-15 Thread Eric Robinson
. -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http

Re: [ClusterLabs] Can Bonding Cause a Broadcast Storm?

2016-11-15 Thread Eric Robinson
tering welcomed Subject: Re: [ClusterLabs] Can Bonding Cause a Broadcast Storm? What bonding mode are you using? Some modes require additional configuration from the switch to avoid flooding. Also, is spanning tree enabled on the switches? On Tue, Nov 15, 2016 at 1:26 PM Eric Robinson <e

Re: [ClusterLabs] Antw: Establishing Timeouts

2016-10-10 Thread Eric Robinson
> AFAIK, it _all_ ARP targets did not respond _once_ the link will be > considered down It would be great if someone could confirm that. > after "Down Delay". I guess you want to use multiple (and the correct ones) > ARP IP targets... Yes, I use multiple targets, and arp_all_targets=any.

Re: [ClusterLabs] Can't See Why This Cluster Failed Over

2017-04-10 Thread Eric Robinson
bd0 and p_vip_clust19 are getting the Master designation. -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: h

Re: [ClusterLabs] Fraud Detection Check?

2017-04-13 Thread Eric Robinson
e that much. I just want to make sure people in the list are not getting alerts that my mails are fraudulent. -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: h

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-16 Thread Eric Robinson
> -Original Message- > From: Digimer [mailto:li...@alteeve.ca] > Sent: Sunday, April 16, 2017 11:17 AM > To: Cluster Labs - All topics related to open-source clustering welcomed > <users@clusterlabs.org>; Eric Robinson <eric.robin...@psmnv.com> > Subject: Re

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-17 Thread Eric Robinson
> In shred-nothing cluster "split brain" means whichever MAC address > is in ARP cache of the border router is the one that gets the traffic. > How does the existing code figure this one out? I'm guessing the surviving node broadcasts a gratuitous arp reply.

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-17 Thread Eric Robinson
ike my question was well-timed, as it served as a catalyst for you to write the article. Thanks much, I am working through it now and will doubtless have some questions and comments. Before I say anything more, I want to do some testing in my lab to make sure I have my thoughts collected. -

Re: [ClusterLabs] Can't See Why This Cluster Failed Over

2017-04-07 Thread Eric Robinson
Somebody want to look at this log and tell me why the cluster failed over? All we did was add a new resource. We've done it many times before without any problems. -- Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request: Forwarding cib_apply_diff operation for

[ClusterLabs] Can't See Why This Cluster Failed Over

2017-04-07 Thread Eric Robinson
Somebody want to look at this log and tell me why the cluster failed over? All we did was add a new resource. We've done it many times before without any problems. -- Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request: Forwarding cib_apply_diff operation for

Re: [ClusterLabs] Fraud Detection Check?

2017-04-07 Thread Eric Robinson
> I've received your emails without any alteration or flagging as "fraud". > So I don't think we're doing anything to your emails. Good to know. -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://lists.cluster

Re: [ClusterLabs] Fraud Detection Check?

2017-04-07 Thread Eric Robinson
>> You guys got a thing against Office 365? > doesn't everybody? Fair enough. -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.or

Re: [ClusterLabs] Fraud Detection Check?

2017-04-07 Thread Eric Robinson
> On a serious note, I too received your e-mails without any red flags attached. Thanks for the confirmation. I guess I'm the only one seeing those warnings. Maybe Office 365 has a problem with ClusterLabs. ;-) -- Eric Robinson ___ Users mailing l

Re: [ClusterLabs] Antw: DRBD and SSD TRIM - Slow!

2017-08-02 Thread Eric Robinson
1) iotop did not show any significant io, just maybe 30k/second of drbd traffic. 2) okay. I've never done that before. I'll give it a shot. 3) I'm not sure what I'm looking at there. -- Eric Robinson > -Original Message- > From: Ulrich Windl [mailto:ulrich.wi...@rz.uni-regensb

[ClusterLabs] DRBD and SSD TRIM - Slow!

2017-08-01 Thread Eric Robinson
ges! -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: h

Re: [ClusterLabs] Antw: Re: Antw: DRBD and SSD TRIM - Slow! -- RESOLVED!

2017-08-03 Thread Eric Robinson
of 128MB. Creating an ext4 filesystem on it and trimming only took 1.5 minutes (across multiple tests). Somebody knowledgeable may be able to explain how DISC-MAX affects the trim speed, and why the DISC-MAX value is different when creating the array with mdadm versus lvm. -- Eric Robinson

[ClusterLabs] verify status starts at 100% and stays there?

2017-08-03 Thread Eric Robinson
bug? -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: h

Re: [ClusterLabs] Antw: verify status starts at 100% and stays there?

2017-08-04 Thread Eric Robinson
Yeah, UpToDate was not of concern to me. The part that threw me off was "done:100.00." It did eventually finish, though, and that was shown in the dmesg output. However, 'drbdadm status' said "done:100.00" the whole time, from start to finish, which seems weird.

[ClusterLabs] ClusterLabs.Org Documentation Problem?

2017-08-22 Thread Eric Robinson
ation at ClusterLabs misleading? -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [ClusterLabs] ClusterLabs.Org Documentation Problem?

2017-08-22 Thread Eric Robinson
. If there is anything I can do to assist with getting the documentation cleaned up, I'd be more than glad to help. -- Eric Robinson -Original Message- From: Ken Gaillot [mailto:kgail...@redhat.com] Sent: Tuesday, August 22, 2017 2:08 PM To: Cluster Labs - All topics related to open-source

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Eric Robinson
> > Out of curiosity, what did I say that indicates that we're not using > > fencing? > > > > Same place you said you were new to HA and needed to learn corosync and > pacemaker to use OpenBSD. > I must have misspoken. I said I stopped using OpenBSD back around the year 2000 and switched to

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Eric Robinson
> > I must have misspoken. > > No, I had invisible tags all over my last two messages. Haha, okay. Thought I was going nuts for a moment. --Eric ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Eric Robinson
> > Out of curiosity, do the openSUSE Leap repos and packages work with > SLES? > > I know that there are some base system differences that could cause > problems, things like Leap using systemd/journald for logging while SLES is > still logging via syslog-ng (IIRC)... so it's possible that you

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Eric Robinson
> Jokes (?) aside; Red Hat and SUSE both have paid teams that make sure the > HA software works well. So if you're new to HA, I strongly recommend > sticking with one of those two, and SUSE is what you mentioned. If you really > want to go to BSD or something else, I would recommend learning HA on

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Eric Robinson
> > Also, use fencing. Seriously, just do it. > > Yeah. Fencing is the only bit that's missing from this picture. > Out of curiosity, what did I say that indicates that we're not using fencing? --Eric ___ Users mailing list: Users@clusterlabs.org

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Eric Robinson
> > I can understand how SUSE can charge for support, but not for the > software itself. Corosync, Pacemaker, and DRBD are all open source. > > So why do not you download open source and compile it yourself? > I've done that before and I could if necessary. Rather go with the easiest option

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Eric Robinson
itself. Corosync, Pacemaker, and DRBD are all open source. -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http

[ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Eric Robinson
"High Availability Extension," which I must pay $700/year for? No freaking way! This is Linux we're talking about, right? There's got to be an easy way to install the cluster without paying for a subscription... right? Someone talk me off the ledge here. -- Eri

Re: [ClusterLabs] Azure Resource Agent

2017-09-18 Thread Eric Robinson
The license would be GPL, I suppose, whatever enthusiasts and community contributors usually do. And yes, it would be fun to know I contributed something to the repo. -- Eric Robinson > -Original Message- > From: Kristoffer Grönlund [mailto:kgronl...@suse.com] > Sen

Re: [ClusterLabs] Azure Resource Agent

2017-09-16 Thread Eric Robinson
Forgot to mention that it's called AZaddr and is intended to be dependent on IPaddr2 (or vice versa) and live in /usr/lib/ocf/resource.d/heartbeat. -- Eric Robinson From: Eric Robinson [mailto:eric.robin...@psmnv.com] Sent: Friday, September 15, 2017 3:56 PM To: Cluster Labs - All topics

[ClusterLabs] Azure Resource Agent

2017-09-15 Thread Eric Robinson
Greetings, all -- If anyone's interested, I wrote a resource agent that works with Microsoft Azure. I'm no expert at shell scripting, so I'm certain it needs a great deal of improvement, but I've done some testing and it works with a 2-node cluster in my Azure environment. Offhand, I don't

[ClusterLabs] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-09-25 Thread Eric Robinson
ing the DRBD layer and writing directly to the drives, so we must conclude that DRBD has a data corruption bug under high write load. However, we would be more than happy to be proved wrong. -- Eric Robinson ___ Users mailing list: Users@clusterlabs.

Re: [ClusterLabs] Antw: Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-09-26 Thread Eric Robinson
> I don't know the tool, but isn't the expectation a bit high that the tool > will trim > the correct blocks throuch drbd->LVM/mdadm->device? Why not use the tool > on the affected devices directly? > I did, and the corruption did not occur. It only happened when writing through the DRBD

[ClusterLabs] Is there a Trick to Making Corosync Work on Microsoft Azure?

2017-08-23 Thread Eric Robinson
timestamp: on logger_subsys { subsys: AMF debug: off } } I used tcpdump and I see a lot of traffic between them on port 2224, but nothing else. Is there an issue because the bindinetaddr is 172.28.0.0 but the members have a /23 mask? -- Eric Robinson

Re: [ClusterLabs] Pacemaker in Azure

2017-08-24 Thread Eric Robinson
ent on it. That might work? -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.

[ClusterLabs] Pacemaker in Azure

2017-08-24 Thread Eric Robinson
around this limitation? -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc

Re: [ClusterLabs] Pacemaker in Azure

2017-08-24 Thread Eric Robinson
/Azure parameters, and if those are configured, it would do the appropriate API requests. On Thu, 2017-08-24 at 23:27 +, Eric Robinson wrote: > Leon -- I will pay you one trillion samolians for that resource agent! > Any way we can get our hands on a copy? > > > > -- > Eric

Re: [ClusterLabs] Pacemaker in Azure

2017-08-25 Thread Eric Robinson
Oh, okay. I thought you meant some different ones. -- Eric Robinson Chief Information Officer Physician Select Management, LLC 775.885.2211 x 112 -Original Message- From: Kristoffer Grönlund [mailto:kgronl...@suse.com] Sent: Friday, August 25, 2017 9:56 AM To: Eric Robinson <eric.ro

Re: [ClusterLabs] Pacemaker in Azure

2017-08-25 Thread Eric Robinson
there. -- Eric Robinson > -Original Message- > From: Oyvind Albrigtsen [mailto:oalbr...@redhat.com] > Sent: Friday, August 25, 2017 12:17 AM > To: Cluster Labs - All topics related to open-source clustering welcomed > <users@clusterlabs.org> > Subject: Re: [Cluster

Re: [ClusterLabs] ClusterLabs.Org Documentation Problem?

2017-08-23 Thread Eric Robinson
o corosync.conf seem to be working. -- Eric Robinson -Original Message- From: Jan Friesse [mailto:jfrie...@redhat.com] Sent: Tuesday, August 22, 2017 11:52 PM To: Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org>; kgail...@redhat.com

Re: [ClusterLabs] Is there a Trick to Making Corosync Work on Microsoft Azure?

2017-08-23 Thread Eric Robinson
I figured out the cause. CMAN got installed by yum, and so none of my changes to corosync.conf had any effect, including the udpu directive. Now I just have to figure out how to enable unicast in cman. -- Eric Robinson From: Eric Robinson [mailto:eric.robin...@psmnv.com] Sent: Wednesday

Re: [ClusterLabs] Is there a Trick to Making Corosync Work on Microsoft Azure?

2017-08-23 Thread Eric Robinson
I got it. From: Eric Robinson [mailto:eric.robin...@psmnv.com] Sent: Wednesday, August 23, 2017 6:51 PM To: Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org> Subject: Re: [ClusterLabs] Is there a Trick to Making Corosync Work on Microsoft Azu

[ClusterLabs] Cannot connect to the drbdmanaged process using DBus

2017-12-14 Thread Eric Robinson
I'm sure someone has seen this before. What does it mean? ha11a:~ # drbdmanage init 198.51.100.65 You are going to initialize a new drbdmanage cluster. CAUTION! Note that: * Any previous drbdmanage cluster information may be removed * Any remaining resources managed by a previous drbdmanage

[ClusterLabs] Where to Find pcs and pcsd for OpenSUSE LEAP 4.23

2017-11-06 Thread Eric Robinson
I installed corosync 2.4.3 and pacemaker 1.1.17 from the openSUSE Leap 4.23 repos, but I can't find pcs or pcsd. Anybody know where to download them from? --Eric ___ Users mailing list: Users@clusterlabs.org

Re: [ClusterLabs] Where to Find pcs and pcsd for OpenSUSE LEAP 4.23

2017-11-06 Thread Eric Robinson
To: Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org>; Eric Robinson <eric.robin...@psmnv.com> Subject: Re: [ClusterLabs] Where to Find pcs and pcsd for OpenSUSE LEAP 4.23 Hi, On 11/07/2017 05:35 AM, Eric Robinson wrote: I installed co

Re: [ClusterLabs] Where to Find pcs and pcsd for OpenSUSE LEAP 4.23

2017-11-07 Thread Eric Robinson
> Which aspects of its constraints handling do you like, and why? I'm curious, > since I wasn't aware that it was significantly different from crmsh in this > respect. > Well, to be fair, in the past I have always configured my clusters by using 'crm configure edit' and building the config in

Re: [ClusterLabs] One volume is trimmable but the other is not?

2018-01-26 Thread Eric Robinson
> > I sent this to the drbd list too, but it’s possible that someone here > > may know. > > > > > > > > This is a WEIRD one. > > > > > > > > Why would one drbd volume be trimmable and the other one not? > > > > iirc drbd stores some of the config in the meta-data as well - like e.g. some >

[ClusterLabs] One volume is trimmable but the other is not?

2018-01-25 Thread Eric Robinson
I sent this to the drbd list too, but it's possible that someone here may know. This is a WEIRD one. Why would one drbd volume be trimmable and the other one not? Here you can see me issuing the trim command against two different filesystems. It works on one but fails on the other. ha11a:~ #

[ClusterLabs] Does CMAN Still Not Support Multipe CoroSync Rings?

2018-02-12 Thread Eric Robinson
General question. I tried to set up a cman + corosync + pacemaker cluster using two corosync rings. When I start the cluster, everything works fine, except when I do a 'corosync-cfgtool -s' it only shows one ring. I tried manually editing the /etc/cluster/cluster.conf file adding two sections,

Re: [ClusterLabs] Does CMAN Still Not Support Multipe CoroSync Rings?

2018-02-13 Thread Eric Robinson
Thanks for the suggestion everyone. I'll give that a try. > -Original Message- > From: Jan Friesse [mailto:jfrie...@redhat.com] > Sent: Monday, February 12, 2018 8:49 AM > To: Cluster Labs - All topics related to open-source clustering welcomed > <users@clusterlabs.o

Re: [ClusterLabs] Does CMAN Still Not Support Multipe CoroSync Rings?

2018-02-14 Thread Eric Robinson
> > Thanks for the suggestion everyone. I'll give that a try. > > Sorry, I'm late on this, but I wrote a quick start doc describing this (amongs > other things) some time ago. See the following chapter: > > https://clusterlabs.github.io/PAF/Quick_Start-CentOS-6.html#cluster- > creation > I

[ClusterLabs] Why Won't Resources Move?

2018-07-31 Thread Eric Robinson
I have what seems to be a healthy cluster, but I can't get resources to move. Here's what's installed... [root@001db01a cluster]# yum list installed|egrep "pacem|coro" corosync.x86_64 2.4.3-2.el7_5.1 @updates corosynclib.x86_64 2.4.3-2.el7_5.1

Re: [ClusterLabs] Why Won't Resources Move?

2018-08-01 Thread Eric Robinson
Move? > > On Wed, 2018-08-01 at 03:49 +, Eric Robinson wrote: > > I have what seems to be a healthy cluster, but I can’t get resources > > to move. > > > > Here’s what’s installed… > > > > [root@001db01a cluster]# yum list installed|egrep "pacem|co

Re: [ClusterLabs] Why Won't Resources Move?

2018-08-01 Thread Eric Robinson
> > The message likely came from the resource agent calling crm_attribute > > to set a node attribute. That message usually means the cluster isn't > > running on that node, so it's highly suspect. The cib might have > > crashed, which should be in the log as well. I'd look into that first. > >

Re: [ClusterLabs] Antw: Re: Why Won't Resources Move?

2018-08-02 Thread Eric Robinson
> Hi! > > I'm not familiar with Redhat, but is tis normal?: > > > > corosync: active/disabled > > > pacemaker: active/disabled > > Regards, > Ulrich That's the default after a new install. I had not enabled them to start automatically yet. >

[ClusterLabs] What am I Doing Wrong with Constraints?

2018-08-06 Thread Eric Robinson
I don't understand why a problem with a resource causes other resources above it in the dependency stack (or on the same level with it) to fail over. My dependency stack is: drbd -> filesystem -> floating_ip -> Azure virtual IP | ->

[ClusterLabs] Different Times in the Corosync Log?

2018-08-20 Thread Eric Robinson
The corosync log show different times for lrmd messages than for cib or crmd messages. Note the 4 hour difference. What? Aug 20 13:08:27 [107884] 001store01acib: info: cib_perform_op: +

Re: [ClusterLabs] Antw: Different Times in the Corosync Log?

2018-08-21 Thread Eric Robinson
> Hi! > > I could guess that the processes run with different timezone settings (for > whatever reason). > > Regards, > Ulrich That would be my guess, too, but I cannot imagine how they ended up in that condition. > > >>> Eric Robinson schrieb am 21.0

Re: [ClusterLabs] Different Times in the Corosync Log?

2018-08-21 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Jan Pokorný > Sent: Tuesday, August 21, 2018 2:45 AM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Different Times in the Corosync Log? > > On 21/08/18 08:43 +, Eric Robinson wrote: > >> I coul

Re: [ClusterLabs] Different Times in the Corosync Log?

2018-08-21 Thread Eric Robinson
configuration). > > If you figure this out, I'd love to hear what it was. Gremlins ... You'll be the second to know after me! > > On Tue, 2018-08-21 at 11:45 +0200, Jan Pokorný wrote: > > On 21/08/18 08:43 +, Eric Robinson wrote: > > > > I could guess that the proc

[ClusterLabs] Increasing Token Timeout Safe By Itself?

2019-01-20 Thread Eric Robinson
I have a few corosync+pacemeker clusters in Azure. Occasionally, cluster nodes failover, possibly because of intermittent connectivity loss, but more likely because one or more nodes experiences high load and is not able to respond in a timely fashion. I want to make the clusters a little more

Re: [ClusterLabs] Increasing Token Timeout Safe By Itself?

2019-01-22 Thread Eric Robinson
> -Original Message- > From: Jan Friesse > Sent: Sunday, January 20, 2019 11:57 PM > To: Cluster Labs - All topics related to open-source clustering welcomed > ; Eric Robinson > Subject: Re: [ClusterLabs] Increasing Token Timeout Safe By Itself? > > Eric Robins

Re: [ClusterLabs] Antw: Re: Why Do All The Services Go Down When Just One Fails?

2019-02-20 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Andrei > Borzenkov > Sent: Wednesday, February 20, 2019 8:51 PM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Antw: Re: Why Do All The Services Go Down When > Just One Fails? > > 20.02.2019

Re: [ClusterLabs] Antw: Re: Why Do All The Services Go Down When Just One Fails?

2019-02-20 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Ulrich Windl > Sent: Tuesday, February 19, 2019 11:35 PM > To: users@clusterlabs.org > Subject: [ClusterLabs] Antw: Re: Why Do All The Services Go Down When > Just One Fails? > > >>>

Re: [ClusterLabs] Simulate Failure Behavior

2019-02-22 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Ken Gaillot > Sent: Friday, February 22, 2019 5:06 PM > To: Cluster Labs - All topics related to open-source clustering welcomed > > Subject: Re: [ClusterLabs] Simulate Failure Behavior > > On Sat, 2019-02-23 at 00

[ClusterLabs] Simulate Failure Behavior

2019-02-22 Thread Eric Robinson
I want to mess around with different on-fail options and see how the cluster responds. I'm looking through the documentation, but I don't see a way to simulate resource failure and observe behavior without actually failing over the mode. Isn't there a way to have the cluster MODEL failure and

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
cluster-name: 001db01ab dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9 have-watchdog: false last-lrm-refresh: 1550347798 maintenance-mode: false no-quorum-policy: ignore stonith-enabled: false --Eric From: Users On Behalf Of Eric Robinson Sent: Saturday, February 16, 2019 12:34 PM To: Cluster Labs - All

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
Here are the relevant corosync logs. It appears that the stop action for resource p_mysql_002 failed, and that caused a cascading series of service changes. However, I don't understand why, since no other resources are dependent on p_mysql_002. [root@001db01a cluster]# cat

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
? > -Original Message- > From: Users On Behalf Of Andrei > Borzenkov > Sent: Saturday, February 16, 2019 1:34 PM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One > Fails? > > 17.02.2019 0:03, Eric Robinson пишет

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
> On Sat, Feb 16, 2019 at 09:33:42PM +0000, Eric Robinson wrote: > > I just noticed that. I also noticed that the lsb init script has a > > hard-coded stop timeout of 30 seconds. So if the init script waits > > longer than the cluster resource timeout of 15s, that would ca

[ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
These are the resources on our cluster. [root@001db01a ~]# pcs status Cluster name: 001db01ab Stack: corosync Current DC: 001db01a (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Sat Feb 16 15:24:55 2019 Last change: Sat Feb 16 15:10:21 2019 by root via cibadmin on

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Valentin Vidic > Sent: Saturday, February 16, 2019 1:28 PM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One > Fails? > > On Sat, Feb 16, 2019 at 09:03:43PM +0

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
I'm looking through the docs but I don't see how to set the on-fail value for a resource. > -Original Message- > From: Users On Behalf Of Eric Robinson > Sent: Saturday, February 16, 2019 1:47 PM > To: Cluster Labs - All topics related to open-source clustering welcomed

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Eric Robinson
t; > On Tue, 2019-02-19 at 17:40 +, Eric Robinson wrote: > > > -Original Message- > > > From: Users On Behalf Of Andrei > > > Borzenkov > > > Sent: Sunday, February 17, 2019 11:56 AM > > > To: users@clusterlabs.org > > > Sub

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Andrei > Borzenkov > Sent: Sunday, February 17, 2019 11:56 AM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One > Fails? > > 17.02.2019 0:44, Eric

Re: [ClusterLabs] Stupid DRBD/LVM Global Filter Question

2019-10-30 Thread Eric Robinson
Roger -- Thank you, sir. That does help. -Original Message- From: Roger Zhou Sent: Wednesday, October 30, 2019 2:56 AM To: Cluster Labs - All topics related to open-source clustering welcomed ; Eric Robinson Subject: Re: [ClusterLabs] Stupid DRBD/LVM Global Filter Question On 10/30

[ClusterLabs] Stupid DRBD/LVM Global Filter Question

2019-10-29 Thread Eric Robinson
If I have an LV as a backing device for a DRBD disk, can someone explain why I need an LVM filter? It seems to me that we would want the LV to be always active under both the primary and secondary DRBD devices, and there should be no need or desire to have the LV activated or deactivated by

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

2020-02-05 Thread Eric Robinson
ei Borzenkov > wrote: > >05.02.2020 20:55, Eric Robinson пишет: > >> The two servers 001db01a and 001db01b were up and responsive. Neither > >had been rebooted and neither were under heavy load. There's no > >indication in the logs of loss of network connectivity. Any

[ClusterLabs] Why Do Nodes Leave the Cluster?

2020-02-05 Thread Eric Robinson
The two servers 001db01a and 001db01b were up and responsive. Neither had been rebooted and neither were under heavy load. There's no indication in the logs of loss of network connectivity. Any ideas on why both nodes seem to think the other one is at fault? (Yes, it's a 2-node cluster without

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

2020-02-05 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Andrei > Borzenkov > Sent: Wednesday, February 5, 2020 12:14 PM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Why Do Nodes Leave the Cluster? > > 05.02.2020 20:55, Eric Robinson пишет: > > The two

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

2020-02-05 Thread Eric Robinson
topics related to open-source clustering welcomed ; Andrei Borzenkov Subject: Re: [ClusterLabs] Why Do Nodes Leave the Cluster? Hi Erik, what has led you to think that there was no network loss ? Best Regards, Strahil Nikolov В сряда, 5 февруари 2020 г., 22:59:56 ч. Гринуич+2, Eric Robinson

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

2020-02-05 Thread Eric Robinson
В четвъртък, 6 февруари 2020 г., 01:44:55 ч. Гринуич+2, Eric Robinson mailto:eric.robin...@psmnv.com>> написа: Hi Strahil – I can’t prove there was no network loss, but: 1. There were no dmesg indications of ethernet link loss. 2. Other than corosync, there are no oth

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

2020-02-06 Thread Eric Robinson
Hi Nikolov -- > Defaults are 1s token, 1.2s consensus which is too small. > In Suse, token is 10s, while consensus is 1.2 * token -> 12s. > With these settings, cluster will not react for 22s. > > I think it's a good start for your cluster . > Don't forget to put the cluster in

Re: [ClusterLabs] Antw: [EXT] Re: Why Do Nodes Leave the Cluster?

2020-02-06 Thread Eric Robinson
> > > > I've done that with all my other clusters, but these two servers are > > in Azure, so the network is out of our control. > > Is a normal cluster supported to use corosync over Internet? I'm not sure > (because of the delays and possible packet losses). > > As with most things, the main

[ClusterLabs] Verifying DRBD Run-Time Configuration

2020-04-11 Thread Eric Robinson
If I want to know the current DRBD runtime settings such as timeout, ping-int, or connect-int, how do I check that? I'm assuming they may not be the same as what shows in the config file. --Eric Disclaimer : This email and any files transmitted with it are confidential and intended solely

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

2020-04-11 Thread Eric Robinson
e clusters? Should I use a larger consensus anyway? --Eric > -Original Message- > From: Strahil Nikolov > Sent: Thursday, February 6, 2020 1:07 PM > To: Eric Robinson ; Cluster Labs - All topics > related to open-source clustering welcomed ; > Andrei Borzenkov > Su

[ClusterLabs] qdevice up and running -- but questions

2020-04-11 Thread Eric Robinson
1. What command can I execute on the qdevice node which tells me which client nodes are connected and alive? 1. In the output of the pcs qdevice status command, what is the meaning of... Vote: ACK (ACK) 1. In the output of the pcs quorum status Command,

  1   2   >