Hi, I'm looking for a way to show just location constraints, if they exist, for
a cluster. I'm looking for the same data shown in the output of `pcs config`
under the "Location Constraints:" header, but without all the rest, so that I
can write a script that checks if there are any set.
The
s timeout by running:
> pcs stonith update shell_timeout=10
>
> Oyvind
>
> On 08/08/19 12:13 -0600, Casey & Gina wrote:
>> Hi, I'm currently running into periodic premature killing of nodes due to
>> the fence monitor timeout being set to 5 seconds. Here
Hi, I'm currently running into periodic premature killing of nodes due to the
fence monitor timeout being set to 5 seconds. Here is an example message from
the logs:
fence_vmware_rest[22334] stderr: [ Exception: Operation timed out after 5001
milliseconds with 0 bytes received ]
How can I
So I've been using the fence_vmware_rest fence agent for a long while now. It
seems to work great, except that after a few days or weeks, a given cluster
will end up showing it as failed and stopped.
For whatever reason, fencing continues to work when needed, but seeing the
fence agent
ed looking at the fence agent to
determine which API calls might be being executed but I can't figure that out
myself...in any case I don't see how this is offering any real value...happy to
learn how I might be wrong, though...
> On 2018-08-01, at 2:26 PM, Casey & Gina wrote:
>
> How i
ote:
>
> Aha, thank you! I missed the blatantly obvious. I will discuss with my
> colleague and likely use a longer interval.
>
>> On Jul 30, 2018, at 11:25 PM, Klaus Wenninger wrote:
>>
>>> On 07/31/2018 01:47 AM, Casey & Gina wrote:
>>> I've se
I've set up a number of clusters in a VMware environment, and am using the
fence_vmware_rest agent for fencing (from fence-agents 4.2.1), as follows:
Stonith Devices:
Resource: vmware_fence (class=stonith type=fence_vmware_rest)
Attributes: ip= username= password= ssl_insecure=1
> After I successfully upgraded Pacemaker from 1.1.14 to 1.1.18 and corosync
> from 2.3.35 to 2.4.4 on Ubuntu 14.04 I am trying to repeat the same scenario
> on Ubuntu 16.04.
Forgive me for interjecting, but how did you upgrade on Ubuntu? I'm frustrated
with limitations in 1.1.14
/reboot a node
in the cluster before enabling fencing in the pacemaker config. So not sure
why some times it registered and sometimes it didn't, but it seems that
enabling stonith always registers it.
> On 2018-07-11, at 12:56 PM, Casey & Gina wrote:
>
> I have a number of clusters in
I have a number of clusters in a vmWare ESX environment which have all been set
up following the same steps, unless somehow I did something wrong on some
without realizing it.
The issue I am facing is that on some of the clusters, after adding the STONITH
resource, testing with `stonith_admin
?
> On 2018-06-20, at 4:04 PM, Casey & Gina wrote:
>
>> On 2018-06-20, at 3:59 PM, Casey & Gina wrote:
>>
>>> Get the cluster healthy, tail the system logs from both nodes, trigger a
>>> fault and wait for things to settle. Then share the logs please.
>&
> On 2018-06-20, at 3:59 PM, Casey & Gina wrote:
>
>> Get the cluster healthy, tail the system logs from both nodes, trigger a
>> fault and wait for things to settle. Then share the logs please.
>
> What do you mean by "system logs"? Do you mean th
> Note: Please reply to he list, not me directly.
I intended to. I don't know why sometimes when I click "Reply" it defaults to
the list but sometimes it does not. Anyways...
> The stonith delay helps predict who will win in a comms break event
> where both try to fence the other at the same
Does this mean that fencing can't actually work in a 2-node cluster?? Or is it
just that the delay needs set differently on one of the hosts and it will start
working?
> On 2018-06-20, at 3:50 PM, Digimer wrote:
>
> On 2018-06-20 05:46 PM, Jehan-Guillaume de Rorthais wrote:
>> On Wed, 20 Jun
My corosync.conf (which I don't manually create, I guess pcs does this?)
already has:
quorum {
provider: corosync_votequorum
two_node: 1
}
No go.
> On 2018-06-20, at 3:46 PM, Jehan-Guillaume de Rorthais
> wrote:
>
> On Wed, 20 Jun 2018 17:24:41 -0400
> Digimer wrote:
>
>> Make sure
I tried testing out a fencing configuration that I had working with a 3-node
cluster, using a 2-node cluster. What I found is that when I power off one of
the nodes forcibly, it does not get fenced and rebooted as it does on a 3-node
cluster. I have verified that I can fence and reboot one
> There are different code paths when RA is called automatically by
> resource manager and when RA is called manually by crm_resource. The
> latter did not export this environment variable until 1.1.17. So
> documentation is correct in that you do not need 1.1.17 to use RA
> normally, as part of
> Well, that does not sound very polite to user :)
The thing that really threw me off was pacemaker rebooting the node as soon as
I'd try to start the cluster on it without the database running.
Is there a way to prevent this from happening? Some way to indicate to
Pacemaker, "Hey, I'm not
> Quick look at PAF manual gives
>
> you need to rebuild the PostgreSQL instance on the failed node
>
> did you do it? I am not intimately familiar with Postgres, but in this
> case I expect that you need to make database on node B secondary (slave,
> whatever it is called) to new master on node
> There is no "master node" in pacemaker. There is master/slave resource
> so at the best it is "node on which specific resource has master role".
> And we have no way to know which on which node you resource had master
> role when you did it. Please be more specific, otherwise it is hard to
>
> In this case, the agent is returning "master (failed)", which does not
> mean that it previously failed when it was master -- it means it is
> currently running as master, in a failed condition.
Well, it surely is NOT running. So the likely problem is the way it's doing
this check? I see a
> On May 27, 2018, at 2:28 PM, Ken Gaillot wrote:
>
>> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: info:
>> determine_op_status: Operation monitor found resource postgresql-10-
>> main:2 active on d-gp2-dbpg0-2
>
>> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: notice:
>>
> On May 27, 2018, at 2:28 PM, Ken Gaillot wrote:
>
> Pacemaker isn't fencing because the start failed, at least not
> directly:
>
>> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: info:
>> determine_op_status: Operation monitor found resource postgresql-10-
>> main:2 active on
> On May 25, 2018, at 7:01 AM, Casey Allen Shobe
> wrote:
>
>> Actually, why is Pacemaker fencing the standby node just because a resource
>> fails to start there? I thought only the master should be fenced if it were
>> assumed to be broken.
This is probably the
> gcore is part of gdb:
> https://packages.ubuntu.com/xenial/amd64/gdb/filelist
>
> Note that using the utility should have no observable influence
> on the running process in question.
When I ran gcore on the pid, it produced a whole bunch of memory read errors
like this:
warning: Memory read
ulrich.wi...@rz.uni-regensburg.de> wrote:
>
>>>> Casey & Gina <caseyandg...@icloud.com> schrieb am 23.05.2018 um 20:43 in
> Nachricht <3b8567a0-ef36-44af-bbad-0d494b08f...@icloud.com>:
> [...]
>> I ran `strace ‑p `, and the screen filled with the followi
Okay, I have this happening again on a couple servers right now, and am happy
to let it spin and dig more into it. I'm not at all experienced with stuff
like this though, so will need some explicit instruction on what to do beyond
what I've documented here...
I don't see anything of note in
> It does exactly what you told it to do. If you want to power-on VM on
> reset instead, remove RESETPOWERON parameter.
Sorry, that was a part of the command that I found in
/usr/share/doc/cluster-glue/stonith/README.vcenter, as well as on
In the meantime, I thought I'd try running the fence_vmware_soap command, but
it doesn't seem to be working, despite me using the same credentials that
worked with the external/vcenter plugin. Is there a way to get more debugging
information about why it says unable to connect/login? The
> On May 18, 2018, at 1:29 PM, Ken Gaillot wrote:
>> Perhaps there is a bug in the packaging?
>
> It sounds like it, or perhaps a portability issue in the agent itself.
There were missing dependencies. I've resolved that, so now am coming back to
trying this...
> There are missing dependencies in Ubuntu 16.04, see
> https://github.com/ClusterLabs/pcs/issues/168
> for details.
Thank you!
> It may be worth filing a bug against Ubuntu.
I did that already when I sent this E-mail, to which they suggested the same
fix. I have shared the above link in that
> Can you share some HW specs with us, at least the architecture
> to start with -- x86_64=amd64, arm (gen/mode?), something else?
It's x86_64, running Ubuntu 16.04; the latest package versions available from
Ubuntu repositories. They are vmWare ESX nodes with 16 CPU cores and 64GB of
memory
> So, then instead of powering off the VM in vSphere, I instead tried a
> `killall -9 corosync` on the primary. This resulted in the VIP coming up on
> node 3, and node 1 being rebooted. Great!
Unfortunately, things don't work at all when it comes to the PostgreSQL
resource agent... When I
> May 18 20:36:27 [4282] d-gp2-dbpg0-2 stonith-ng: warning: log_operation:
> vfencing:16264 [ Performing: stonith -t external/vcenter -T reset
> d-gp2-dbpg0-1 ]
> May 18 20:36:27 [4282] d-gp2-dbpg0-2 stonith-ng: warning: log_operation:
> vfencing:16264 [ failed: d-gp2-dbpg0-1 5 ]
> Having it started on one node is normal. Fence devices default to
> requires=quorum, meaning they can start on a new node even before the
> original node is fenced. It looks like that's what happened here, but
> something went wrong with the fencing, so the cluster assumes it's
> still active on
arted d-gp2-dbpg0-1
postgresql-master-vip (ocf::heartbeat:IPaddr2): Started d-gp2-dbpg0-1
Master/Slave Set: postgresql-ha [postgresql-10-main]
Masters: [ d-gp2-dbpg0-1 ]
Slaves: [ d-gp2-dbpg0-2 d-gp2-dbpg0-3 ]
--
As always, thank you all for any help that you can provide,
--
>> pcmk_host_list="" - not sure about this one - I'm guessing
>> this would actually be the same input as the list I was inputting to
>> the HOSTLIST parameter with the external/vcenter approach?
>>
>> port="" - not sure about this one - with this approach would
>> I need to issue the above
> Here is a command to adapt that work to fence a VM connecting to an esxi
> server:
>
> pcs stonith create fence_vmware_soap \
>pcmk_host_check="static-list" pcmk_host_list="" \
>port="" ipaddr="" login="" \
>passwd="<>password" ssl="1"
> Thank you - I hadn't seen the "releases" link on github before and somehow
> missed that. Sorry for that. I thought there would be download links
> somewhere on the clusterlabs website. I will try compiling this today to try.
I finally managed to get pcs-0.9.164 compiled and installed.
Hi Ken,
Thanks for your explanations - they are really helpful in coming to understand
this set of software.
> Whether to use one fence resource for the whole cluster, or one for
> each node, is partly a question of what the device requires and partly
> a personal preference.
I think that one
>> Barring that, where can I download source code packages? I have only
>> been able to find the github, which has 0.9 and 0.10 branches, but I
>> can't find any .tar.gz's to download
>
> Really?
>
> https://github.com/ClusterLabs/pcs/releases
Thank you - I hadn't seen the "releases" link on
> Is there an apt repository which provides more recent versions?
I'm guessing no, based on trying fruitlessly to search for one.
> Is there a way to use the version that Ubuntu provides (0.9.149) to
> accomplish the desired result?
Barring that, where can I download source code packages? I
> On May 16, 2018, at 1:28 PM, Andrei Borzenkov wrote:
>
> It seems that your pcs is too old
>
> https://github.com/ClusterLabs/pcs/issues/81
I'm using Ubuntu 16.04 and the latest versions of the packages provided by them.
Is there an apt repository which provides more
> What "pcs stonith list" says?
fence_alom - Fence agent for Sun ALOM
fence_amt - Fence agent for AMT
fence_apc - Fence agent for APC over telnet/ssh
fence_apc_snmp - Fence agent for APC, Tripplite PDU over SNMP
fence_bladecenter - Fence agent for IBM BladeCenter
fence_brocade - Fence agent for
/cib: @num_updates=30
May 16 18:13:09 [7502] d-gp2-dbpg0-2cib: info: cib_perform_op:
++
/cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id='postgresql-10-main']:
On May 16, 2018, at 11:01 AM, Casey & Gina <caseyandg...@icloud.com> wrote:
> On May 16, 2018, at 10:43 AM, Casey & Gina <caseyandg...@icloud.com> wrote:
>
> Thank you and Andrei for the advice...
>
>> the pcs alternative commands are:
>>
>> pcs stonith create vfencing external/vcenter \
>> VI_SERVER=10.1.1.1 VI_CREDSTORE=/
Thank you and Andrei for the advice...
> the pcs alternative commands are:
>
> pcs stonith create vfencing external/vcenter \
> VI_SERVER=10.1.1.1 VI_CREDSTORE=/etc/vicredentials.xml \
> HOSTLIST="hostname1=vmname1;hostname2=vmname2" RESETPOWERON=0 \
> op monitor interval=60s
When I attempt the
Hi, I'm trying to figure out how to get fencing/stonith going with pacemaker.
As far as I understand it, they are both part of the same thing - setting up
stonith means setting up fencing. If I'm mistaken on that, please let me know.
Specifically, I'm wanting to use the external/vcenter
Without fencing, if the primary is powered off abruptly (e.g. if one of your
ESX servers crashes), the standby will not become primary, and you will need to
promote it manually. We had exactly this scenario happen last week with a
2-node cluster. Without fencing, you don't have high
I don't know why this happens, but I encounter this often. My workaround is
this:
killall -9 pacemakerd; killall pengine; killall lrmd; killall cib; killall
corosync
> On May 10, 2018, at 11:26 PM, 范国腾 wrote:
>
> Hi,
>
> When I run the "pcs cluster stop --all", it
As I said in the E-mail you're replying to, I did try removing
/etc/corosync.conf and retrying (I also included a link to the bug report). I
pasted the output of pcs cluster auth after that in my last E-mail as well. Is
there some other fix I'm missing? Because that's the only step I saw to
> On Apr 19, 2018, at 12:37 AM, Tomas Jelinek wrote:
>
> Also it would be nice to know what pcs version you have.
My apologies for omitting this from my previous E-mail... It's PCS version
0.9.149, and I'm running it on an updated Ubuntu 16.04 installation.
> On Apr 19, 2018, at 12:37 AM, Tomas Jelinek wrote:
>
> Can you run those two commands with the --debug flag and post the output?
> Also it would be nice to know what pcs version you have.
Sure thing. It looks like it's erroring because /var/lib/pcsd/tokens doesn't
What would cause this situation? I'm following the same process as I have
previously, and don't understand why I'm unable to create new clusters now...
root@d-gp2-dbpg1-1:~# pcs cluster auth d-gp2-dbpg1-1 d-gp2-dbpg1-2
d-gp2-dbpg1-3 -u hacluster -p 'mypassword'
d-gp2-dbpg1-1: Authorized
> The PAF resource agent need to connect to your local PostgreSQL instance to
> check its status in various situations. Parameters "pgport" and "pghost" are
> by
> default "5432" and "/tmp" (same defaults than PostgreSQL policy). The "/tmp"
> value is the directory where PostgreSQL creates its
It looks like the main problem was that I needed to add
pghost="/var/run/postgresql" to the postgresql-10-main resource. I'm not sure
why I have to do that, but it makes things work.
For both this and my last E-mail to the list that was also a problem with the
command being run to start the
Please forgive me if this message is a duplicate - I sent it yesterday but it
is not showing up on them mailing list or in the archives, so I'm trying a
second time...
I'm using this resource agent: http://clusterlabs.github.io/PAF
I'm trying to set up a 3-node cluster.
I install
Hi, thank you very much for your response!
> Something comes in mind: did you setup "systemd-tmpfiles" as explained in the
> end of the following chapter ?
>
> https://clusterlabs.github.io/PAF/Quick_Start-Debian-9-pcs.html#postgresql-and-cluster-stack-installation
That was it - somehow I
Hi!
I've set up a couple test Pacemaker/Corosync/PCS clusters on virtual machines
to manage a PostgreSQL service using the PostgreSQL Automatic Failover (PAF)
resource agent available here: https://clusterlabs.github.io/PAF/
The cluster will be up and running with 3 nodes and PostgreSQL
59 matches
Mail list logo