OCF 1.1 is now formally adopted!
https://github.com/ClusterLabs/OCF-spec/blob/master/ra/1.1/resource-agent-api.md
Thanks to everyone who gave feedback.
Now to add support for it ...
On Tue, 2021-03-09 at 17:07 -0600, Ken Gaillot wrote:
> Hi all,
>
> After many false starts over the
sh - $OCF_RESKEY_user -c "test -w $dir"; then
> ocf_log warn "Directory $dir is not writable by
> $OCF_RESKEY_user, attempting chown"
> ocf_run chown $OCF_RESKEY_user:$OCF_RESKEY_group $dir \
>
56(84) bytes of data.
> > > 64 bytes from 192.168.56.9: icmp_seq=1 ttl=64 time=0.504 ms
> > > 64 bytes from 192.168.56.9: icmp_seq=2 ttl=64 time=0.750 ms
> > > ...
> > >
> > > [root@node2 ~]# ping 192.168.56.9
> > > PING 192.168.56.9 (192.168.56.9) 56(8
of the sets in the command, i.e.
whether it's the primary set first or the dependent set first, but I
think that's right.)
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
ceed with the "Add Apache HTTP" section. Once apache is
set up as a cluster resource, you should be able to contact the web
server at the floating IP (or more realistically whatever name you've
associated with that IP), and have the cluster fail over both the IP
address and web server as
. Make sure
you can ping the floating IP address from some other machine. Then test
fail-over and ensure you can still ping the floating IP. From there it
should be straightforward.
>
>
>
>
>
> On Wednesday, March 24, 2021, 12:33:53 AM GMT+4:30, Ken Gaillot <
> kgail..
node2?
> > > # pcs resource create ClusterIP
> > > ocf:heartbeat:IPaddr2 ip=192.168.122.120 cidr_netmask=32 op
> > > monitor interval=30s
> > >
> > > If yes, then I must update it with below command?
> > >
> > > # pcs resource update f
t;Add Apache HTTP Server as a Cluster
Service".
> On Monday, March 22, 2021, 07:06:47 PM GMT+4:30, Ken Gaillot <
> kgail...@redhat.com> wrote:
>
>
>
>
>
> On Mon, 2021-03-22 at 08:15 +, Jason Long wrote:
> > Thank you.
> >
> > My test
erIP ocf:heartbeat:IPaddr2
> > ip="IP_That_Never_Used_In_The_Network" cidr_netmask=32 op monitor
> > interval=30s
> >
> > # pcs resource create WebSite ocf:heartbeat:apache
> > configfile=/etc/httpd/conf/httpd.conf statusurl="
> > http://loca
would be treated as 0.
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
can you write it here?
>
>
>
>
>
>
> On Thursday, March 18, 2021, 01:40:22 AM GMT+3:30, Ken Gaillot <
> kgail...@redhat.com> wrote:
>
>
>
>
>
> On Wed, 2021-03-17 at 20:37 +, Jason Long wrote:
> > The 192.168.1.4 is my secondary VM.
>
gt; * WebServer
>
>
>
> Logs are:
> https://paste.ubuntu.com/p/nHfTRFh4RD/
>
>
>
> Why?
>
>
>
>
>
> On Wednesday, March 17, 2021, 11:42:11 PM GMT+3:30, Jason Long <
> hack3r...@yahoo.com> wrote:
>
>
>
>
>
located with the virtual-IP - right?
> > > > >
> > > > > Klaus
> > > > >
> > > > > > > $ sudo pcs resource create http_server
> > > > > > > ocf:heartbeat:apache
> > > > > > > configfile="/etc/httpd/conf.d/VirtualHost.conf" op
> > > > > > > monitor
> > > > > > > timeout="5s" interval="5s"
> > > > > > >
> > > > > > > On both servers (Main and Secondary), pcsd service is
> > > > > > > enabled, but
> > > > > > > when I want to see my Apache Web Server then it show me
> > > > > > > below error:
> > > > > > >
> > > > > > > Proxy Error
> > > > > > > The proxy server received an invalid response from an
> > > > > > > upstream
> > > > > > > server.
> > > > > > > The proxy server could not handle the request
> > > > > > > Reason: Error reading from remote server
> > > > > > >
> > > > > > > Why? Which parts of my configuration is wrong?
> > > > > > > The output of "sudo pcs status" command is:
> > > > > > > https://paste.ubuntu.com/p/V9KvHKwKtC/
> > > > > > >
> > > > > > > Thank you.
> > > > > >
> > > > > > The thing to investigate is:
> > > > > >
> > > > > > Failed Resource Actions:
> > > > > > * http_server_start_0 on node2 'error' (1): call=12,
> > > > > > status='Timed Out', exitreason='', last-rc-change='2021-03-
> > > > > > 16 12:28:14 +03:30', queued=0ms, exec=40004ms
> > > > > > * http_server_start_0 on node1 'error' (1): call=14,
> > > > > > status='Timed Out', exitreason='', last-rc-change='2021-03-
> > > > > > 16 12:28:52 +03:30', queued=0ms, exec=40002ms
> > > > > >
> > > > > > The web server start timed out. Check the system, pacemaker
> > > > > > and apache
> > > > > > logs around those times for any hints.
> > > > > >
> > > > > > Did you enable and test the status URL? The
> > > > > > ocf:heartbeat:apache agent
> > > > > > checks the status as part of its monitor (which is also
> > > > > > done for
> > > > > > start). It would be something like:
> > > > > >
> > > > > > cat <<-END >/etc/httpd/conf.d/status.conf
> > > > > >
> > > > > > SetHandler server-status
> > > > > > Require local
> > > > > >
> > > > > > END
> > > > > >
> > > > >
> > > > > ___
> > > > > Manage your subscription:
> > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > >
> > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > >
> > > > > ___
> > > > > Manage your subscription:
> > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > >
> > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > Reid Wahl, RHCA
> > > > Senior Software Maintenance Engineer, Red Hat
> > > > CEE - Platform Support Delivery - ClusterHA
> > >
> > > >
> > > >
> > > > ___
> > > > Manage your subscription:
> > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > >
> > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > ___
> > > > Manage your subscription:
> > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > >
> > > > ClusterLabs home: https://www.clusterlabs.org/
> > > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Reid Wahl, RHCA
> > > Senior Software Maintenance Engineer, Red Hat
> > > CEE - Platform Support Delivery - ClusterHA
> > >
> > > ___
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home: https://www.clusterlabs.org/
> > > ___
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home: https://www.clusterlabs.org/
> > >
> >
> >
> > --
> > Regards,
> >
> > Reid Wahl, RHCA
> > Senior Software Maintenance Engineer, Red Hat
> > CEE - Platform Support Delivery - ClusterHA
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
>
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
notifyd
> > # ssh rhel83-2 systemctl start corosync-notifyd
> > # ssh rhel83-3 systemctl start corosync-notifyd
> >
> > Is there any plan for pcs to support corosync-notifyd?
> >
> > Regards,
> > Kazunori INOUE
--
Ken Gaillot
004ms
* http_server_start_0 on node1 'error' (1): call=14, status='Timed Out',
exitreason='', last-rc-change='2021-03-16 12:28:52 +03:30', queued=0ms,
exec=40002ms
The web server start timed out. Check the system, pacemaker and apache
logs around those times for any hints.
Did you ena
e some time when both
Pacemaker 2 and 3 releases are being made and supported.
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
r/ra/1.0/resource-agent-api.md
https://github.com/kgaillot/OCF-spec/blob/ocf1.1/ra/1.1/resource-agent-api.md
My goal is to merge the pull request formally adopting 1.1 by the end
of this month.
--
Ken Gaillot
___
Manage your subscription:
https://lists.c
me dependencies for ruby 2.2.0+ Which is not
> available in RHEL 7.x stream and getting compilation error , Please
> check and advise us whether pcs-0.10 is supported on RHEL 7.
>
>
>
> Thanks and Regards,
> S Sathish S
--
Ken Gaillot
___
rspective, you can certainly build and run Pacemaker
2.0 from source on RHEL 7, but Red Hat won't support it. Also, the
version of pcs supplied with RHEL 7 is only compatible with Pacemaker
1.1, so you'd need to build pcs from source as well.
--
Ken Gaillot
_
ng on h16 even though the node was stopped.
>
> So the problem was on stopping, not starting, but still I doubt the
> probe at that time is quite reliable.
>
> >
> > A bug is certainly possible, though we can't say without more
> > detail :)
>
> I see what you
/monitor returncode like
> OCF_NOT_READY would make sense if the operation detects that it
> cannot return a current status (so both "running" and "stopped" would
> be as inadequate as "starting" and "stopping" would be (
est. While Pacemaker still supports it, as a practical
matter we'll accept pull requests from anyone using it who finds
problems, but we aren't going to spend development or testing time on
it ourselves. Support will likely be dropped in a few years.
On Mon, 2021-01-11 at 12:27 -0600, Ken Gai
On Thu, 2021-02-25 at 11:15 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot schrieb am 24.02.2021 um
> > > > 23:45 in
>
> Nachricht
> <6373352fd18e819bada715a7d610499a658eda29.ca...@redhat.com>:
> > On Wed, 2021‑02‑24 at 11:16 +0100, Ulrich Windl wrot
* Is corosync running, and how many nodes can be seen?
> > > * Is Pacemaker running, how many nodes does it see, and does it
> > > have a
> >
> > quorum?
> > > * Is the current node DC?
> > > * How many resources matching some regular expression are
> > > running?
> > >
> > > Regards,
> > > Ulrich
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
On Thu, 2021-02-25 at 06:34 +, shivraj dongawe wrote:
>
> @Ken Gaillot, Thanks for sharing your inputs on the possible behavior
> of the cluster.
> We have reconfirmed that dlm on a healthy node was waiting for
> fencing of faulty node and shared storage access on t
___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
ose based on free capacity.
For example, if a location constraint gives a particular node a high
enough preference, that will be considered more important than free
capacity. ("High enough" being relative to the rest of the
configuration -- other constraint scores, etc.)
--
Ken Gaillot
_
, 2021-02-23 at 08:41 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot schrieb am 19.02.2021 um
> > > > 17:48 in
>
> Nachricht
> :
> > On Fri, 2021‑02‑19 at 17:54 +0300, Andrei Borzenkov wrote:
> > > In the latest PDF versions I downloaded recently
gt; İsmet BALAT
> > > > > > > > ___
> > > > > > > > Manage your subscription:
> > > > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > > >
eration of
> > > postgres, we
> > > >> have
> > > >> > found many messages related to failure of fencing and other
> > > resources
> > > >> such
> > > >> > as dlm and vg waiting for fencing to complete.
It does s
ions can
Same situation -- it's the node that executes the action that knows how
long it took, so that's where the log is.
> time-out, wouldn't that be a useful info to have as well?
>
> Regards,
> Ulrich
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
@devel
> could look into perhaps and "fix" in near future ?
>
> many thanks, L
What was the command you used that gave that error message?
FYI a similar issue was fixed a while back for "crm_resource --show-
metadata".
--
Ken Gaillot
the next release (toward the middle of this year)
that's relevant as well.
You'll be able to set critical=false on secret-dropbox, and if secret-
dropbox fails enough times to need to be moved to a different node,
then secret-dropbox will stop
show there. The preview is at the same link above.
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
h-pkg-name, --
with-brand, --enable-ansi, and --enable-no-stack, most of which were
ignored or broken.
A big list of all changes for 2.1.0 can be found at:
https://wiki.clusterlabs.org/wiki/Pacemaker_2.1_Changes
--
Ken Gaillot
___
Manage your subscriptio
ou can set a node attribute (e.g. "site" = "1" or "2") for
each node, then colocate the IPs with the master role using the site
node attribute. See:
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#s-coloc-attribute
--
ference.
This would involve considerable work for developers, so I'm curious how
many users would find this useful and would use it. Especially if most
of the time you used journalctl -x, there was no extended information,
but occasionally there was.
--
Ken Gaillot
__
_
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
ika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin
> Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
: I could ping h16 from h18 using
> > > > > the host name
> > > > > without any problem.
> > > > >
> > > > > Two points:
> > > > > Why would h18 think h16 should be fenced?
> > > > > The gailed asserztio
t; > >
> > > Resources Defaults:
> > > resource-stickiness=1000
> > > Operations Defaults:
> > > No defaults set
> > >
> > > Cluster Properties:
> > > cluster-infrastructure: corosync
> > > cluster-name: EMS
> > > dc-version: 2.0.2-3.el8-744a30d655
> > > have-watchdog: false
> > > last-lrm-refresh: 1612951127
> > > symmetric-cluster: true
> > >
> > > Quorum:
> > > Options:
> > > --
> > >
> > > Regards,
> > > Ben
> >
> >
> >
> >
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
lts:
> No defaults set
>
> Cluster Properties:
> cluster-infrastructure: corosync
> cluster-name: EMS
> dc-version: 2.0.2-3.el8-744a30d655
> have-watchdog: false
> last-lrm-refresh: 1612951127
> symmetric-cluster: true
>
> Quorum:
st it's status, believes it cannot host resources
> > and stops them all
> > for whatever reason, perhaps somehow due to the completely missing
> > transient_attributes, node02 nevers schedules a probe for itself
> > we have to "refresh" manually
> >
> > On Mon
.
> Feb 08 08:12:30 h18 sbd[33953]: notice: inquisitor_child: Servant
> cluster is healthy (age: 0)
>
> Broadcast message from systemd-journald@h18 (Mon 2021-02-08 08:12:32
> CET):
>
> sbd[33953]:emerg: do_exit: Rebooting system: reboot
>
> Feb 08 08:12:32 h18 sbd[33958]: /dev/disk/by-id/dm-name-SBD_1-
> 3P1:
f we
> > can
> > build the package, it won´t run.
> >
> > Fabio
> >
> > > * Is the crm_attribute the right choice for setting cluster
> > > properties
> > > and will cover all usecases?
> > > *
> > >
> > > Thank You!
> > >
> > > B
On Mon, 2021-02-08 at 09:30 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot schrieb am 05.02.2021 um
> > > > 16:47 in Nachricht
>
> <7247097610e6ab4f3a44a7648e0acf32fbdb9937.ca...@redhat.com>:
>
> Hi!
>
> ...
> > > Doesn't systemctl re
uting -
> > > rsc:prm_libvirtd action:start call_id:97
> > >
> > > So one could guess that lirtlockd and libvirtd are staring
> > > concurrently,
> > > but the did not because of this sequence:
> > > Feb 04 15:41:27 h19 pacemaker-controld[7796]: notice: Result of
> > > start
> > > operation for prm_virtlockd on h19: ok
> > > Feb 04 15:41:27 h19 pacemaker-execd[7793]: notice: executing -
> > > rsc:prm_libvirtd action:start call_id:97
> > >
> > > Regards,
> > > Ulrich
> > >
> > >
> > > ___
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home: https://www.clusterlabs.org/
> > >
> > >
> >
> > --
> > Regards,
> >
> > Reid Wahl, RHCA
> > Senior Software Maintenance Engineer, Red Hat
> > CEE - Platform Support Delivery - ClusterHA
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
:
> Hello,
>
> I would like to clarify support period for Pacemaker 1.1.
> Until which year it is planned to backport bugfixes into 1.1 branch
> and
> create releases?
>
--
Ken Gaillot
___
Manage your subscription:
https://lists.cl
15 years. In pacemaker 2.x the master/slave
> resource changed to clone so this is new to me.
master/slave -> promotable clone was purely a terminology change, the
behavior is identical
> Any input is most helpful!
> Brent
--
Ken Gaillot
__
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
not, I would like to deprecate it and eventually
> drop support, to help with code maintenance.
>
> My gut feeling is that remote administration is moving more towards
> GUIs and orchestrators, and the Pacemaker feature is not particularly
> useful anymore.
--
Ken Gaillot
_
keeping it, but if not, I would like to deprecate it and eventually
drop support, to help with code maintenance.
My gut feeling is that remote administration is moving more towards
GUIs and orchestrators, and the Pacemaker feature is not particularly
useful anymore.
--
Ken Gaillot
gt; > >
> > >
> > >
> > >
> > > These attributes are necessary for "node02" to be Master/Primary,
> > > correct?
> > >
> > > Why might this be happenin
tion to answer. At the very least
> > you
> > need to show full logs from both nodes around time it happens
> > (starting
> > with both nodes losing connectivity).
> >
> > But as a wild guess - you do not use stonith, node01 becomes DC and
> > clears other node state.
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
9 gen 2021 alle ore 11:22 Ulrich Windl <
> ulrich.wi...@rz.uni-regensburg.de> ha scritto:
> > >>> Andrei Borzenkov schrieb am 28.01.2021 um
> > 18:30 in
> > Nachricht :
> > > 27.01.2021 22:03, Ken Gaillot пишет:
> > >>
>
On Mon, 2021-02-01 at 09:58 -0600, Ken Gaillot wrote:
> On Fri, 2021-01-29 at 12:37 -0500, Stuart Massey wrote:
> > Can someone help me with this?
> > Background:
> > > "node01" is failing, and has been placed in "maintenance" mode.
> > > It
we prevent it?
Transient attributes are always cleared when a node leaves the cluster
(that's what makes them transient ...). It's probably coincidence it
went through as the node rejoined.
When the node rejoins, it will trigger another run of the scheduler,
which will schedule a probe of all resources on the node. Those probes
should reset the promotion score.
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
gt; make[1]: *** [core] Error 1
> make[1]: Leaving directory `/root/sathish/pacemaker-Pacemaker-2.0.5'
> make: *** [build] Error 2
>
> Thanks and Regards,
> S Sathish S
> _______
> Manage your subscription:
> https://
gt; > disable all
> > >> > the resources of the group (or the group itself) if it cant be
> > run all
> > >> the
> > >> > resoruces somewhere.
> > >> >
> > >>
> > >> That's what pacemaker group does. I am not sure what you mean
> > with
> > >> "disable all resources". If resource fail count on a node
> > exceeds
> > >> threshold, this node is banned from running resource. If
> > resource failed
> > >> on every node, no node can run it until you clear fail count.
> > >>
> > >> "Disable resource" in pacemaker would mean setting its target-
> > role to
> > >> stopped. That does not happen automatically (at least I am not
> > aware of
> > >> it).
> > >> ___
> > >> Manage your subscription:
> > >> https://lists.clusterlabs.org/mailman/listinfo/users
> > >>
> > >> ClusterLabs home: https://www.clusterlabs.org/
> > >>
> >
> >
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
On Thu, 2021-01-28 at 11:23 +0100, Ulrich Windl wrote:
> Ken,
>
> thanks for analyzing the logs! See comments inline...
>
> > > > Ken Gaillot schrieb am 27.01.2021 um
> > > > 19:55 in
>
> Nachricht
> <644fc719a2e8870c332db859bcdef275d986249a.ca
On Thu, 2021-01-28 at 11:12 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot schrieb am 27.01.2021 um
> > > > 18:46 in
>
> Nachricht
> <02cd90fcc10f1021d9f51649e2991da3209a6935.ca...@redhat.com>:
> > On Wed, 2021-01-27 at 08:35 +0100, Ulrich Windl
ources". If resource fail count on a node exceeds
> > threshold, this node is banned from running resource. If resource
> > failed
> > on every node, no node can run it until you clear fail count.
> >
> > "Disable resource" in pacemaker would mean setting its t
On Wed, 2021-01-27 at 08:35 +0100, Ulrich Windl wrote:
> > > > Tomas Jelinek schrieb am 26.01.2021 um
> > > > 16:15 in
>
> Nachricht
> <48f935a5-184f-d2d7-7f1a-db596aa6c...@redhat.com>:
> > Dne 25. 01. 21 v 17:01 Ken Gaillot napsal(a):
> > >
On Wed, 2021-01-27 at 08:29 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot schrieb am 26.01.2021 um
> > > > 16:08 in
>
> Nachricht
> :
> > On Tue, 2021‑01‑26 at 02:12 ‑0500, Digimer wrote:
> > > Hi all,
> > >
> > > I created a
On Tue, 2021-01-26 at 11:03 -0500, Digimer wrote:
> On 2021-01-26 10:15 a.m., Tomas Jelinek wrote:
> > Dne 25. 01. 21 v 17:01 Ken Gaillot napsal(a):
> > > On Mon, 2021-01-25 at 09:51 +0100, Jehan-Guillaume de Rorthais
> > > wrote:
> > > > Hi Digimer,
> &
g: srv01-test_stop_0[2647133] timed out after 2ms
> Jan 26 07:07:25 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
> error: Result of stop operation for srv01-test on el8-a01n01: Timed
> Out
> Jan 26 07:07:25 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
>
On Mon, 2021-01-25 at 13:18 -0500, Digimer wrote:
> On 2021-01-25 11:01 a.m., Ken Gaillot wrote:
> > On Mon, 2021-01-25 at 09:51 +0100, Jehan-Guillaume de Rorthais
> > wrote:
> > > Hi Digimer,
> > >
> > > On Sun, 24 Jan 2021 15:31:22 -0500
> > >
;I want R to run on N2", then that is
in fact a location preference, which is expressed via a constraint.
> But as things are now: Could the cluster recheck take care of those?
>
> >
> >
> > Best Regards,Strahil Nikolov
> > Sent from Yahoo Mail on Android
--
Ken Gail
attribute, set to the epoch timestamp of the request. It would be
possible to set that attribute for all nodes in a copy of the CIB, then
load that into the live cluster.
stop-all-resources as suggested would be another way around it (and
would have to be cleared after start-up, which could be a plus or a
minus depending on how much control vs convenience you want).
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
sources.
We do need to add that to the documentation ...
In order of highest precedence first:
is-managed=false (Pacemaker won't stop or start)
stop-all-resources=true
target-role on a specific resource
target-role in resource defaults
stop-all-resources will keep resources stopped for
On Fri, 2021-01-22 at 08:58 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot schrieb am 22.01.2021 um
> > > > 00:51 in
>
> Nachricht
> :
> > Hi all,
> >
> > A recurring request we've seen from Pacemaker users is a feature
> > cal
On Fri, 2021-01-22 at 08:38 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot schrieb am 21.01.2021 um
> > > > 17:24 in
>
> Nachricht
> <28f8b077a30233efa41d04688eb21e82c8432ddd.ca...@redhat.com>:
> > On Thu, 2021‑01‑21 at 08:19 +0100, Ulrich Windl wr
d be marked
with influence=false, or the reporting tool resource could be give the
meta-attribute critical=false, to achieve the desired effect.
A big list of all changes for 2.1.0 can be found at:
https://wiki.clusterlabs.org/wiki/Pacemaker_2.1_Changes
--
Ken Gaillot
__
t; Are there any tools to help understanding?
Mainly crm_simulate
> Note: For placement-strategy=utilization it's easier: As long as
> there is sufficient capacity, distribute the resources on the node
> that has least number of resources.
>
> Regards,
> Ulrich
--
Ken Gaillot
nd:Mandatory) (id:order-drbd_database-clone-fs_database-
> > > mandatory)
> > > start drbd_logsfiles-clone then start fs_logfiles
> > > (kind:Mandatory) (id:order-drbd_logsfiles-clone-fs_logfiles-
> > > mandatory)
> > > Colocation Constraints
; >
> > > Another odd data point: On the slave if I do a "pcs node standby"
> > > & then unstandby, DRBD is loaded again; HOWEVER, when I do this
> > > on the master (which should then be slave), DRBD doesn't get
> > > loaded.
> > >
> > > Stonith/Fencing doesn't seem to make a difference. Not sure if
> > > auto-promote is required.
> > >
> >
> > Quote from official documentation (
> > https://www.linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-pacemaker-crm-drbd-backed-service
> > ):
> > If you are employing the DRBD OCF resource agent, it is recommended
> > that you defer DRBD startup, shutdown, promotion, and
> > demotion exclusively to the OCF resource agent. That means that you
> > should disable the DRBD init script:
> > So remove the autopromote and disable the drbd service at all.
> >
> > Best Regards, Strahil Nikolov
> >
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
> Virus-free. www.avast.com
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
On Mon, 2021-01-18 at 14:10 -0500, Digimer wrote:
> On 2021-01-18 1:52 p.m., Ken Gaillot wrote:
> > On Sun, 2021-01-17 at 21:11 -0500, Digimer wrote:
> > > Hi all,
> > >
> > > Mind the slew of questions, well into testing now and finding
> > >
:) If you've found
some other way where it doesn't work as expected, let me know. (Of
course, there's also the separate possibility of node failure, manual
or DLM-initiated fencing, etc. but I'm sure you're familiar with all
that.)
>
> Thanks for any insight/guidance!
>
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
ation srv01-test_delete_0 locally on
> el8-a01n02
> Jan 18 02:03:59 el8-a01n02.alteeve.ca pacemaker-controld[490050]:
> notice: Transition 63 aborted by deletion of
> lrm_resource[@id='srv01-test']: Resource state removal
> Jan 18 02:04:00 el8-a01n02.alteeve.ca pacemaker-controld[490050]:
> notice: Result of monitor operation for virsh_node2_pulsar on el8-
> a01n02: ok
> Jan 18 02:04:00 el8-a01n02.alteeve.ca pacemaker-controld[490050]:
> notice: Transition 63 (Complete=2, Pending=0, Fired=0, Skipped=0,
> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-103.bz2):
> Complete
> Jan 18 02:04:00 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]:
> notice: Calculated transition 64, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-104.bz2
> Jan 18 02:04:00 el8-a01n02.alteeve.ca pacemaker-controld[490050]:
> notice: Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0,
> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-104.bz2):
> Complete
> Jan 18 02:04:00 el8-a01n02.alteeve.ca pacemaker-controld[490050]:
> notice: State transition S_TRANSITION_ENGINE -> S_IDLE
>
>
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
lure), when a transition action
returns an unexpected result (like a start failing instead of
succeeding), and periodically (according to cluster-recheck-interval).
In any case, it's possible there's nothing to do, so the transition has
no actions. It's still a record that the cluster checke
ok like a bug in pacemaker?
>
> Regards,
> Ulrich
>From the above it's not apparent why fencing was needed.
It makes sense that things would move once the time-based rule kicked
in. Some event likely happened during the day that made the move
preferable, which might be difficult to fi
, pureScale Domain
> E-mail: ge...@ca.ibm.com
>
>
> > - Original message -
> > From: Ken Gaillot
> > Sent by: "Users"
> > To: Cluster Labs - All topics related to open-source clustering
> > welcomed
> > Cc:
> > Subject: [EXTERNAL] Re:
n corosync.conf. I was hoping
> Pacemaker has something similar but I don't see anything in
> '/etc/sysconfig/pacemaker' or the Pacemaker documentation regarding
> hi-res timestamps.
>
> Gerry Sommerville
> Db2 Development, pureScale Domain
> E
s.org/wiki/Pacemaker_2.1_Changes
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
On Mon, 2021-01-11 at 16:31 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot schrieb am 11.01.2021 um
> > > > 15:46 in
>
> Nachricht
> <3df79a20eb4440357759cca4fe5b0e0729e47085.ca...@redhat.com>:
> > On Mon, 2021-01-11 at 08:25 +0100, Ulrich Win
On Mon, 2021-01-11 at 08:25 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot schrieb am 08.01.2021 um
> > > > 17:38 in
>
> Nachricht
> <662b69bff331fae41771cf8833e819c2d5b18044.ca...@redhat.com>:
> > On Fri, 2021‑01‑08 at 11:46 +0100, Ulrich Windl wrot
ln_testVG_activate )
> colocation col_lvm_activate__lvmlockd inf: ( cln_testVG_activate )
> cln_lvmlockd
> ### lvmlockd similarly depends on DLM (order, colocation), so I don't
> see a problem
>
> Finally:
> h16:~ # vgs
> VG #PV #LV #SN Attr VSize VFree
> sys 1 3 0 wz--n-
om Scratch to use Pacemaker 2. Comments
and suggestions are welcome. Pull requests are even more welcome. :)
A big list of all changes being considered for 2.1.0 can be found at:
https://wiki.clusterlabs.org/wiki/Pacemaker_2.1_Changes
Happy clustering,
--
K
ny second is rebooted by stonith ?
> > > Which is the parameter to set the number of seconds?
> > > Sorry for my bad english
> > > Thanks
> > > Ignazio
> > > ___________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman
pted behavior or our resource operation setting is invalid.(refer
> above config settings).?
> 2)Any other parameter that can help to avoid this issue..?
>
> Thanks and Regards,
> S Sathish S
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
o either way. :)
Of course, you can configure sshd to listen on the cluster interface.
If you give the cluster interface on each node a unique name in DNS (or
hosts or whatever), you can ssh to that name.
--
Ken Gaillot
___
Manage your subscription:
https://l
defaults to 2 seconds.
The best thing would be to do some manual testing using ipmitool or
whatnot to turn off the power, and observe how long it takes between
when the command returns and the server actually is powered down. Then
set power_wait to a comfortable margin above that. Or just keep raising
power_wait until the problem goes away :)
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
gt; use_mgmtd: yes
> }
>
> and then service-directive parameters are mandatory sections for
> configurations?
>
> best regards.
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
urce
> > > refresh" (rebprobe) the cluster tried to fix the problem.
> > > Well at some point the VM wouldn't start any more, because the
> > > BtrFS used
> > > for all (SLES default) was corrupted in a way that seems
> > > unrecoverable,
> > > independenlty of how many subvolumes and snapshots of those may
> > > exist.
> > >
> > > Initially I would guess the libvirt stack and VirtualDomain is
> > > less
> >
> > reliable
> > > than the old Xen method and RA.
> > >
> > > Regards,
> > > Ulrich
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
--
Ken Gaillot
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
is
> > > done? Or how can I just delay the resource start so I can make it
> > larger than
> > > its pcmk_delay_base?
> >
> > We probably need to see logs and configs to understand.
> >
> > >
> > &g
the second node?
> > > > ___
> > > > Manage your subscription:
> > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > >
> > > > ClusterLabs home: https://www.clusterlabs.org/
> &g
On Wed, 2020-12-16 at 04:46 -0500, Tony Stocker wrote:
> On Tue, Dec 15, 2020 at 12:29 PM Ken Gaillot
> wrote:
> >
> > On Tue, 2020-12-15 at 17:02 +0300, Andrei Borzenkov wrote:
> > > On Tue, Dec 15, 2020 at 4:58 PM Tony Stocker <
> > > akostoc...@gmail.c
cate it with the workload
resources.
Or you could write a systemd timer unit to call your script when
desired, and colocate that with the workload as a systemd resource in
the cluster.
Or similar to the crm_resource method, you could colocate an
oc
re is a proposal to support that. Basically it's
just a matter of developer time being unavailable.
> 2) Is there any other better option available to
> avoid timed out issues in first occurrence itself.?
Only increasing the timeout.
> 3) we thought of
opped
> > > > I tried restarting zpool_data or other resources:
> > > > # crm resource start zpool_data
> > > > but nothing happens!
> > > > How can I recover from this state? Node2 needs to stay down,
> > > but I want
301 - 400 of 1690 matches
Mail list logo