[ClusterLabs] Command to show location constraints?

2019-08-27 Thread Casey & Gina
Hi, I'm looking for a way to show just location constraints, if they exist, for a cluster. I'm looking for the same data shown in the output of `pcs config` under the "Location Constraints:" header, but without all the rest, so that I can write a script that checks if there are any set. The

Re: [ClusterLabs] Increasing fence timeout

2019-08-13 Thread Casey & Gina
s timeout by running: > pcs stonith update shell_timeout=10 > > Oyvind > > On 08/08/19 12:13 -0600, Casey & Gina wrote: >> Hi, I'm currently running into periodic premature killing of nodes due to >> the fence monitor timeout being set to 5 seconds. Here

[ClusterLabs] Increasing fence timeout

2019-08-08 Thread Casey & Gina
Hi, I'm currently running into periodic premature killing of nodes due to the fence monitor timeout being set to 5 seconds. Here is an example message from the logs: fence_vmware_rest[22334] stderr: [ Exception: Operation timed out after 5001 milliseconds with 0 bytes received ] How can I

[ClusterLabs] Command to show just Failed Actions?

2018-12-02 Thread Casey & Gina
So I've been using the fence_vmware_rest fence agent for a long while now. It seems to work great, except that after a few days or weeks, a given cluster will end up showing it as failed and stopped. For whatever reason, fencing continues to work when needed, but seeing the fence agent

Re: [ClusterLabs] Fence agent executing thousands of API calls per hour

2018-08-01 Thread Casey & Gina
ed looking at the fence agent to determine which API calls might be being executed but I can't figure that out myself...in any case I don't see how this is offering any real value...happy to learn how I might be wrong, though... > On 2018-08-01, at 2:26 PM, Casey & Gina wrote: > > How i

Re: [ClusterLabs] Fence agent executing thousands of API calls per hour

2018-08-01 Thread Casey & Gina
ote: > > Aha, thank you! I missed the blatantly obvious. I will discuss with my > colleague and likely use a longer interval. > >> On Jul 30, 2018, at 11:25 PM, Klaus Wenninger wrote: >> >>> On 07/31/2018 01:47 AM, Casey & Gina wrote: >>> I've se

[ClusterLabs] Fence agent executing thousands of API calls per hour

2018-07-30 Thread Casey & Gina
I've set up a number of clusters in a VMware environment, and am using the fence_vmware_rest agent for fencing (from fence-agents 4.2.1), as follows: Stonith Devices: Resource: vmware_fence (class=stonith type=fence_vmware_rest) Attributes: ip= username= password= ssl_insecure=1

Re: [ClusterLabs] Problem with pacemaker init.d script

2018-07-11 Thread Casey & Gina
> After I successfully upgraded Pacemaker from 1.1.14 to 1.1.18 and corosync > from 2.3.35 to 2.4.4 on Ubuntu 14.04 I am trying to repeat the same scenario > on Ubuntu 16.04. Forgive me for interjecting, but how did you upgrade on Ubuntu? I'm frustrated with limitations in 1.1.14

Re: [ClusterLabs] Need help debugging a STONITH resource

2018-07-11 Thread Casey & Gina
/reboot a node in the cluster before enabling fencing in the pacemaker config. So not sure why some times it registered and sometimes it didn't, but it seems that enabling stonith always registers it. > On 2018-07-11, at 12:56 PM, Casey & Gina wrote: > > I have a number of clusters in

[ClusterLabs] Need help debugging a STONITH resource

2018-07-11 Thread Casey & Gina
I have a number of clusters in a vmWare ESX environment which have all been set up following the same steps, unless somehow I did something wrong on some without realizing it. The issue I am facing is that on some of the clusters, after adding the STONITH resource, testing with `stonith_admin

Re: [ClusterLabs] Fencing on 2-node cluster

2018-06-20 Thread Casey & Gina
? > On 2018-06-20, at 4:04 PM, Casey & Gina wrote: > >> On 2018-06-20, at 3:59 PM, Casey & Gina wrote: >> >>> Get the cluster healthy, tail the system logs from both nodes, trigger a >>> fault and wait for things to settle. Then share the logs please. >&

Re: [ClusterLabs] Fencing on 2-node cluster

2018-06-20 Thread Casey & Gina
> On 2018-06-20, at 3:59 PM, Casey & Gina wrote: > >> Get the cluster healthy, tail the system logs from both nodes, trigger a >> fault and wait for things to settle. Then share the logs please. > > What do you mean by "system logs"? Do you mean th

Re: [ClusterLabs] Fencing on 2-node cluster

2018-06-20 Thread Casey & Gina
> Note: Please reply to he list, not me directly. I intended to. I don't know why sometimes when I click "Reply" it defaults to the list but sometimes it does not. Anyways... > The stonith delay helps predict who will win in a comms break event > where both try to fence the other at the same

Re: [ClusterLabs] Fencing on 2-node cluster

2018-06-20 Thread Casey & Gina
Does this mean that fencing can't actually work in a 2-node cluster?? Or is it just that the delay needs set differently on one of the hosts and it will start working? > On 2018-06-20, at 3:50 PM, Digimer wrote: > > On 2018-06-20 05:46 PM, Jehan-Guillaume de Rorthais wrote: >> On Wed, 20 Jun

Re: [ClusterLabs] Fencing on 2-node cluster

2018-06-20 Thread Casey & Gina
My corosync.conf (which I don't manually create, I guess pcs does this?) already has: quorum { provider: corosync_votequorum two_node: 1 } No go. > On 2018-06-20, at 3:46 PM, Jehan-Guillaume de Rorthais > wrote: > > On Wed, 20 Jun 2018 17:24:41 -0400 > Digimer wrote: > >> Make sure

[ClusterLabs] Fencing on 2-node cluster

2018-06-20 Thread Casey & Gina
I tried testing out a fencing configuration that I had working with a 3-node cluster, using a 2-node cluster. What I found is that when I power off one of the nodes forcibly, it does not get fenced and rebooted as it does on a 3-node cluster. I have verified that I can fence and reboot one

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-06-04 Thread Casey & Gina
> There are different code paths when RA is called automatically by > resource manager and when RA is called manually by crm_resource. The > latter did not export this environment variable until 1.1.17. So > documentation is correct in that you do not need 1.1.17 to use RA > normally, as part of

Re: [ClusterLabs] Why would a standby node be fenced? (was: How to set up fencing/stonith)

2018-05-31 Thread Casey & Gina
> Well, that does not sound very polite to user :) The thing that really threw me off was pacemaker rebooting the node as soon as I'd try to start the cluster on it without the database running. Is there a way to prevent this from happening? Some way to indicate to Pacemaker, "Hey, I'm not

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-31 Thread Casey & Gina
> Quick look at PAF manual gives > > you need to rebuild the PostgreSQL instance on the failed node > > did you do it? I am not intimately familiar with Postgres, but in this > case I expect that you need to make database on node B secondary (slave, > whatever it is called) to new master on node

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-31 Thread Casey & Gina
> There is no "master node" in pacemaker. There is master/slave resource > so at the best it is "node on which specific resource has master role". > And we have no way to know which on which node you resource had master > role when you did it. Please be more specific, otherwise it is hard to >

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-30 Thread Casey & Gina
> In this case, the agent is returning "master (failed)", which does not > mean that it previously failed when it was master -- it means it is > currently running as master, in a failed condition. Well, it surely is NOT running. So the likely problem is the way it's doing this check? I see a

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-29 Thread Casey & Gina
> On May 27, 2018, at 2:28 PM, Ken Gaillot wrote: > >> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: info: >> determine_op_status: Operation monitor found resource postgresql-10- >> main:2 active on d-gp2-dbpg0-2 > >> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: notice: >>

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-29 Thread Casey & Gina
> On May 27, 2018, at 2:28 PM, Ken Gaillot wrote: > > Pacemaker isn't fencing because the start failed, at least not > directly: > >> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: info: >> determine_op_status: Operation monitor found resource postgresql-10- >> main:2 active on

[ClusterLabs] Why would a standby node be fenced? (was: How to set up fencing/stonith)

2018-05-25 Thread Casey & Gina
> On May 25, 2018, at 7:01 AM, Casey Allen Shobe > wrote: > >> Actually, why is Pacemaker fencing the standby node just because a resource >> fails to start there? I thought only the master should be fenced if it were >> assumed to be broken. This is probably the

Re: [ClusterLabs] pcsd processes using 100% CPU

2018-05-24 Thread Casey & Gina
> gcore is part of gdb: > https://packages.ubuntu.com/xenial/amd64/gdb/filelist > > Note that using the utility should have no observable influence > on the running process in question. When I ran gcore on the pid, it produced a whole bunch of memory read errors like this: warning: Memory read

Re: [ClusterLabs] Antw: Re: pcsd processes using 100% CPU

2018-05-24 Thread Casey & Gina
ulrich.wi...@rz.uni-regensburg.de> wrote: > >>>> Casey & Gina <caseyandg...@icloud.com> schrieb am 23.05.2018 um 20:43 in > Nachricht <3b8567a0-ef36-44af-bbad-0d494b08f...@icloud.com>: > [...] >> I ran `strace ‑p `, and the screen filled with the followi

Re: [ClusterLabs] pcsd processes using 100% CPU

2018-05-23 Thread Casey &amp; Gina
Okay, I have this happening again on a couple servers right now, and am happy to let it spin and dig more into it. I'm not at all experienced with stuff like this though, so will need some explicit instruction on what to do beyond what I've documented here... I don't see anything of note in

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-22 Thread Casey &amp; Gina
> It does exactly what you told it to do. If you want to power-on VM on > reset instead, remove RESETPOWERON parameter. Sorry, that was a part of the command that I found in /usr/share/doc/cluster-glue/stonith/README.vcenter, as well as on

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-22 Thread Casey &amp; Gina
In the meantime, I thought I'd try running the fence_vmware_soap command, but it doesn't seem to be working, despite me using the same credentials that worked with the external/vcenter plugin. Is there a way to get more debugging information about why it says unable to connect/login? The

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-22 Thread Casey &amp; Gina
> On May 18, 2018, at 1:29 PM, Ken Gaillot wrote: >> Perhaps there is a bug in the packaging? > > It sounds like it, or perhaps a portability issue in the agent itself. There were missing dependencies. I've resolved that, so now am coming back to trying this...

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-22 Thread Casey &amp; Gina
> There are missing dependencies in Ubuntu 16.04, see > https://github.com/ClusterLabs/pcs/issues/168 > for details. Thank you! > It may be worth filing a bug against Ubuntu. I did that already when I sent this E-mail, to which they suggested the same fix. I have shared the above link in that

Re: [ClusterLabs] pcsd processes using 100% CPU

2018-05-22 Thread Casey &amp; Gina
> Can you share some HW specs with us, at least the architecture > to start with -- x86_64=amd64, arm (gen/mode?), something else? It's x86_64, running Ubuntu 16.04; the latest package versions available from Ubuntu repositories. They are vmWare ESX nodes with 16 CPU cores and 64GB of memory

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-18 Thread Casey &amp; Gina
> So, then instead of powering off the VM in vSphere, I instead tried a > `killall -9 corosync` on the primary. This resulted in the VIP coming up on > node 3, and node 1 being rebooted. Great! Unfortunately, things don't work at all when it comes to the PostgreSQL resource agent... When I

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-18 Thread Casey &amp; Gina
> May 18 20:36:27 [4282] d-gp2-dbpg0-2 stonith-ng: warning: log_operation: > vfencing:16264 [ Performing: stonith -t external/vcenter -T reset > d-gp2-dbpg0-1 ] > May 18 20:36:27 [4282] d-gp2-dbpg0-2 stonith-ng: warning: log_operation: > vfencing:16264 [ failed: d-gp2-dbpg0-1 5 ]

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-18 Thread Casey &amp; Gina
> Having it started on one node is normal. Fence devices default to > requires=quorum, meaning they can start on a new node even before the > original node is fenced. It looks like that's what happened here, but > something went wrong with the fencing, so the cluster assumes it's > still active on

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-18 Thread Casey &amp; Gina
arted d-gp2-dbpg0-1 postgresql-master-vip (ocf::heartbeat:IPaddr2): Started d-gp2-dbpg0-1 Master/Slave Set: postgresql-ha [postgresql-10-main] Masters: [ d-gp2-dbpg0-1 ] Slaves: [ d-gp2-dbpg0-2 d-gp2-dbpg0-3 ] -- As always, thank you all for any help that you can provide, --

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-18 Thread Casey &amp; Gina
>> pcmk_host_list="" - not sure about this one - I'm guessing >> this would actually be the same input as the list I was inputting to >> the HOSTLIST parameter with the external/vcenter approach? >> >> port="" - not sure about this one - with this approach would >> I need to issue the above

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-18 Thread Casey &amp; Gina
> Here is a command to adapt that work to fence a VM connecting to an esxi > server: > > pcs stonith create fence_vmware_soap \ >pcmk_host_check="static-list" pcmk_host_list="" \ >port="" ipaddr="" login="" \ >passwd="<>password" ssl="1"

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-18 Thread Casey &amp; Gina
> Thank you - I hadn't seen the "releases" link on github before and somehow > missed that. Sorry for that. I thought there would be download links > somewhere on the clusterlabs website. I will try compiling this today to try. I finally managed to get pcs-0.9.164 compiled and installed.

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-17 Thread Casey &amp; Gina
Hi Ken, Thanks for your explanations - they are really helpful in coming to understand this set of software. > Whether to use one fence resource for the whole cluster, or one for > each node, is partly a question of what the device requires and partly > a personal preference. I think that one

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-17 Thread Casey &amp; Gina
>> Barring that, where can I download source code packages? I have only >> been able to find the github, which has 0.9 and 0.10 branches, but I >> can't find any .tar.gz's to download > > Really? > > https://github.com/ClusterLabs/pcs/releases Thank you - I hadn't seen the "releases" link on

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-16 Thread Casey &amp; Gina
> Is there an apt repository which provides more recent versions? I'm guessing no, based on trying fruitlessly to search for one. > Is there a way to use the version that Ubuntu provides (0.9.149) to > accomplish the desired result? Barring that, where can I download source code packages? I

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-16 Thread Casey &amp; Gina
> On May 16, 2018, at 1:28 PM, Andrei Borzenkov wrote: > > It seems that your pcs is too old > > https://github.com/ClusterLabs/pcs/issues/81 I'm using Ubuntu 16.04 and the latest versions of the packages provided by them. Is there an apt repository which provides more

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-16 Thread Casey &amp; Gina
> What "pcs stonith list" says? fence_alom - Fence agent for Sun ALOM fence_amt - Fence agent for AMT fence_apc - Fence agent for APC over telnet/ssh fence_apc_snmp - Fence agent for APC, Tripplite PDU over SNMP fence_bladecenter - Fence agent for IBM BladeCenter fence_brocade - Fence agent for

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-16 Thread Casey &amp; Gina
/cib: @num_updates=30 May 16 18:13:09 [7502] d-gp2-dbpg0-2cib: info: cib_perform_op: ++ /cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id='postgresql-10-main']: On May 16, 2018, at 11:01 AM, Casey & Gina <caseyandg...@icloud.com> wrote:

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-16 Thread Casey &amp; Gina
> On May 16, 2018, at 10:43 AM, Casey & Gina <caseyandg...@icloud.com> wrote: > > Thank you and Andrei for the advice... > >> the pcs alternative commands are: >> >> pcs stonith create vfencing external/vcenter \ >> VI_SERVER=10.1.1.1 VI_CREDSTORE=/

Re: [ClusterLabs] How to set up fencing/stonith

2018-05-16 Thread Casey &amp; Gina
Thank you and Andrei for the advice... > the pcs alternative commands are: > > pcs stonith create vfencing external/vcenter \ > VI_SERVER=10.1.1.1 VI_CREDSTORE=/etc/vicredentials.xml \ > HOSTLIST="hostname1=vmname1;hostname2=vmname2" RESETPOWERON=0 \ > op monitor interval=60s When I attempt the

[ClusterLabs] How to set up fencing/stonith

2018-05-15 Thread Casey &amp; Gina
Hi, I'm trying to figure out how to get fencing/stonith going with pacemaker. As far as I understand it, they are both part of the same thing - setting up stonith means setting up fencing. If I'm mistaken on that, please let me know. Specifically, I'm wanting to use the external/vcenter

Re: [ClusterLabs] Two-node cluster fencing

2018-05-12 Thread Casey &amp; Gina Shobe
Without fencing, if the primary is powered off abruptly (e.g. if one of your ESX servers crashes), the standby will not become primary, and you will need to promote it manually. We had exactly this scenario happen last week with a 2-node cluster. Without fencing, you don't have high

Re: [ClusterLabs] “pcs cluster stop -all” hangs and

2018-05-11 Thread Casey &amp; Gina
I don't know why this happens, but I encounter this often. My workaround is this: killall -9 pacemakerd; killall pengine; killall lrmd; killall cib; killall corosync > On May 10, 2018, at 11:26 PM, 范国腾 wrote: > > Hi, > > When I run the "pcs cluster stop --all", it

Re: [ClusterLabs] pcs cluster setup fails after pcs cluster auth suceeds

2018-04-23 Thread Casey &amp; Gina
As I said in the E-mail you're replying to, I did try removing /etc/corosync.conf and retrying (I also included a link to the bug report). I pasted the output of pcs cluster auth after that in my last E-mail as well. Is there some other fix I'm missing? Because that's the only step I saw to

Re: [ClusterLabs] pcs cluster setup fails after pcs cluster auth suceeds

2018-04-19 Thread Casey &amp; Gina
> On Apr 19, 2018, at 12:37 AM, Tomas Jelinek wrote: > > Also it would be nice to know what pcs version you have. My apologies for omitting this from my previous E-mail... It's PCS version 0.9.149, and I'm running it on an updated Ubuntu 16.04 installation.

Re: [ClusterLabs] pcs cluster setup fails after pcs cluster auth suceeds

2018-04-19 Thread Casey &amp; Gina
> On Apr 19, 2018, at 12:37 AM, Tomas Jelinek wrote: > > Can you run those two commands with the --debug flag and post the output? > Also it would be nice to know what pcs version you have. Sure thing. It looks like it's erroring because /var/lib/pcsd/tokens doesn't

[ClusterLabs] pcs cluster setup fails after pcs cluster auth suceeds

2018-04-18 Thread Casey &amp; Gina
What would cause this situation? I'm following the same process as I have previously, and don't understand why I'm unable to create new clusters now... root@d-gp2-dbpg1-1:~# pcs cluster auth d-gp2-dbpg1-1 d-gp2-dbpg1-2 d-gp2-dbpg1-3 -u hacluster -p 'mypassword' d-gp2-dbpg1-1: Authorized

Re: [ClusterLabs] Trouble starting up PAF cluster for first time

2018-04-09 Thread Casey &amp; Gina
> The PAF resource agent need to connect to your local PostgreSQL instance to > check its status in various situations. Parameters "pgport" and "pghost" are > by > default "5432" and "/tmp" (same defaults than PostgreSQL policy). The "/tmp" > value is the directory where PostgreSQL creates its

Re: [ClusterLabs] Trouble starting up PAF cluster for first time

2018-04-06 Thread Casey &amp; Gina
It looks like the main problem was that I needed to add pghost="/var/run/postgresql" to the postgresql-10-main resource. I'm not sure why I have to do that, but it makes things work. For both this and my last E-mail to the list that was also a problem with the command being run to start the

[ClusterLabs] Trouble starting up PAF cluster for first time

2018-04-06 Thread Casey &amp; Gina
Please forgive me if this message is a duplicate - I sent it yesterday but it is not showing up on them mailing list or in the archives, so I'm trying a second time... I'm using this resource agent: http://clusterlabs.github.io/PAF I'm trying to set up a 3-node cluster. I install

Re: [ClusterLabs] Cluster fails to start on rebooted nodes without manual fiddling...

2018-04-03 Thread Casey &amp; Gina
Hi, thank you very much for your response! > Something comes in mind: did you setup "systemd-tmpfiles" as explained in the > end of the following chapter ? > > https://clusterlabs.github.io/PAF/Quick_Start-Debian-9-pcs.html#postgresql-and-cluster-stack-installation That was it - somehow I

[ClusterLabs] Cluster fails to start on rebooted nodes without manual fiddling...

2018-04-02 Thread Casey &amp; Gina
Hi! I've set up a couple test Pacemaker/Corosync/PCS clusters on virtual machines to manage a PostgreSQL service using the PostgreSQL Automatic Failover (PAF) resource agent available here: https://clusterlabs.github.io/PAF/ The cluster will be up and running with 3 nodes and PostgreSQL