Re: [ClusterLabs] Antw: [EXT] True time periods/CLOCK_MONOTONIC node vs. cluster wide (Was: Coming in Pacemaker 2.0.4: dependency on monotonic clock for systemd resources)

2020-03-12 Thread Jan Pokorný
On 12/03/20 08:22 +0100, Ulrich Windl wrote: > Sorry for top-posting, but if you have NTP-synced your nodes, > CLOCK_MONOTONIC will not have much advantage over CLOCK_REALTIME > as the clocks will be rather the same, I guess you mean they will be rather the same amongst the nodes relative to each

[ClusterLabs] True time periods/CLOCK_MONOTONIC node vs. cluster wide (Was: Coming in Pacemaker 2.0.4: dependency on monotonic clock for systemd resources)

2020-03-11 Thread Jan Pokorný
On 11/03/20 09:04 -0500, Ken Gaillot wrote: > On Wed, 2020-03-11 at 08:20 +0100, Ulrich Windl wrote: >> You only have to take care not to compare CLOCK_MONOTONIC >> timestamps between nodes or node restarts. > > Definitely :) > > They are used only to calculate action queue and run durations

Re: [ClusterLabs] Finding attributes of a past resource agent invocation

2020-03-04 Thread Jan Pokorný
Hi Feri, just to this one... On 03/03/20 15:22 +0100, wf...@niif.hu wrote: > Is there a way to find out what attributes were passed to the OCF > agent in that fateful invocation? AFAIK, not possible after-the-fact, unless you add TRACE_RA=1 as another (real) parameter to the agent and it

Re: [ClusterLabs] [RFC] LoadSharingIP agent idea, xt_cluster/IPv6/nftables

2020-01-15 Thread Jan Pokorný
On 02/01/20 21:45 +0100, Jan Pokorný wrote: > So, any takers? :-) > > Of course, as discussed before, this overhaul would deserve a strict > separation of the function in question under a dedicated agent like > LoadSharingIP, letting the equivalent one in IPaddr2 agent and based

Re: [ClusterLabs] Making xt_cluster IP load-sharing work with IPv6

2020-01-14 Thread Jan Pokorný
On 14/01/20 21:16 +0300, Andrei Borzenkov wrote: > 14.01.2020 17:47, Jan Pokorný пишет: >> What confused me is that 00:zz:yy:xx:5a:27 appears as if the same >> address shall be used -- but in your explanation, it would definitely >> be that case, correct? > > I expect

Re: [ClusterLabs] Making xt_cluster IP load-sharing work with IPv6

2020-01-14 Thread Jan Pokorný
On 11/01/20 19:47 +0300, Andrei Borzenkov wrote: > 04.01.2020 01:42, Valentin Vidić пишет: >> On Thu, Jan 02, 2020 at 09:52:09PM +0100, Jan Pokorný wrote: >>> What you've used appears to be akin to what this chunk of manpage >>> suggests (amongst others): >>>

Re: [ClusterLabs] Prevent Corosync Qdevice Failback in split brain scenario.

2020-01-07 Thread Jan Pokorný
On 02/01/20 14:30 +0100, Jan Friesse wrote: >> I am planning to use Corosync Qdevice version 3.0.0 with corosync >> version 2.4.4 and pacemaker 1.1.16 in a two node cluster. >> >> I want to know if failback can be avoided in the below situation. >> >> >> 1. The pcs cluster is in split brain

Re: [ClusterLabs] pacemaker-controld getting respawned

2020-01-07 Thread Jan Pokorný
On 06/01/20 11:53 -0600, Ken Gaillot wrote: > On Fri, 2020-01-03 at 13:23 +, S Sathish S wrote: >> Pacemaker-controld process is getting restarted frequently reason for >> failure disconnect from CIB/Internal Error (or) high cpu on the >> system, same has been recorded in our system logs,

[ClusterLabs] Making xt_cluster IP load-sharing work with IPv6 (Was: Concept of a Shared ipaddress/resource for generic applicatons)[

2020-01-02 Thread Jan Pokorný
On 27/12/19 15:04 +0100, Valentin Vidić wrote: > On Wed, Dec 04, 2019 at 02:44:49PM +0100, Jan Pokorný wrote: >> For the record, based on my feedback, iptables-extensions man page is >> headed to (finally) align with the actual in-kernel deprecation >> message: >> https:

[ClusterLabs] [RFC] LoadSharingIP agent idea, xt_cluster/IPv6/nftables (Was: Support for xt_cluster)

2020-01-02 Thread Jan Pokorný
On 19/12/19 10:18 -0600, Ken Gaillot wrote: > On Thu, 2019-12-19 at 15:01 +, Marcus Vinicius wrote: >> Is there any intention to abandon CLUSTERIP > > yes > >> in favor of xt_cluster.ko? > > no > > :) > > A recent thread about this: >

Re: [ClusterLabs] Fuzzy/misleading references to "restart" of a resource

2019-12-05 Thread Jan Pokorný
On 05/12/19 10:41 +0300, Andrei Borzenkov wrote: > On Thu, Dec 5, 2019 at 1:04 AM Jan Pokorný wrote: >> >> On 04/12/19 21:19 +0100, Jan Pokorný wrote: >>> OTOH, this enforced split of state transitions is perhaps what makes >>> the transaction (comprising perha

Re: [ClusterLabs] Fuzzy/misleading references to "restart" of a resource

2019-12-04 Thread Jan Pokorný
On 04/12/19 21:19 +0100, Jan Pokorný wrote: > OTOH, this enforced split of state transitions is perhaps what makes > the transaction (comprising perhaps countless other interdependent > resources) serializable and thus feasible at all (think: you cannot > nest any further ha

[ClusterLabs] Fuzzy/misleading references to "restart" of a resource (Was: When does pacemaker call 'restart'/'force-reload' operations on LSB resource?)

2019-12-04 Thread Jan Pokorný
On 04/12/19 14:53 +0900, Ondrej wrote: > When adding 'LSB' script to pacemaker cluster I can see that > pacemaker advertises 'restart' and 'force-reload' operations to be > present - regardless if the LSB script supports it or not. This > seems to be coming from following piece of code. > >

Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons

2019-12-04 Thread Jan Pokorný
On 03/12/19 23:38 +0100, Valentin Vidić wrote: > On Tue, Dec 03, 2019 at 11:14:41PM +0100, Jan Pokorný wrote: >> The conclusion is hence that even with bleeding edge software >> collection, there's no real problem in using ipt_CLUSTERIP >> (when compiled in or alongside

Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons

2019-12-03 Thread Jan Pokorný
On 03/12/19 23:19 +0100, Valentin Vidić wrote: > Interesting enough, ipt_CLUSTERIP still seems to work when using > iptables-legacy :) 5 minutes ago, in another part of this thread :) -- Jan (Poki) pgpS99Ce6dNGb.pgp Description: PGP signature ___

Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons

2019-12-03 Thread Jan Pokorný
On 03/12/19 20:38 +0100, Valentin Vidić wrote: > On Tue, Dec 03, 2019 at 03:06:14PM +0100, Jan Pokorný wrote: >> You likely refer to >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=43270b1bc5f1e33522dacf3d3b9175c29404c36c >>

Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons

2019-12-03 Thread Jan Pokorný
On 02/12/19 09:50 -0600, Ken Gaillot wrote: > On Sat, 2019-11-30 at 18:58 +0300, Andrei Borzenkov wrote: >> 29.11.2019 17:46, Jan Pokorný пишет: >>> "Clone" feature for IPAddr2 is actually sort of an overloading that >>> agent with an alternative fu

Re: [ClusterLabs] Concept of a Shared ipaddress/resource for generic applicatons

2019-11-29 Thread Jan Pokorný
On 27/11/19 20:13 +, matt_murd...@amat.com wrote: > I finally understand that there is a Apache Resource for Pacemaker > that assigns a single virtual ipaddress that "floats" between two > nodes as in webservers. >

Re: [ClusterLabs] ftime-imposed problems and compatibility going forward (Was: Final Pacemaker 2.0.3 release now available)

2019-11-26 Thread Jan Pokorný
On 26/11/19 20:30 +0100, Jan Pokorný wrote: > And yet another, this time late, stay-compatible type of problem with > pacemaker vs. concurrent changes in build dependencies has popped up, > this time with Inkscape. Whenever its 1.0 version is released, > when with my ask https:

Re: [ClusterLabs] ftime-imposed problems and compatibility going forward (Was: Final Pacemaker 2.0.3 release now available)

2019-11-26 Thread Jan Pokorný
On 26/11/19 16:08 +0100, Jan Pokorný wrote: > On 25/11/19 20:32 -0600, Ken Gaillot wrote: >> The final release of Pacemaker version 2.0.3 is now available at: >> >> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.0.3 > > For downstream and individu

[ClusterLabs] ftime-imposed problems and compatibility going forward (Was: Final Pacemaker 2.0.3 release now available)

2019-11-26 Thread Jan Pokorný
On 25/11/19 20:32 -0600, Ken Gaillot wrote: > The final release of Pacemaker version 2.0.3 is now available at: > > https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.0.3 For downstream and individual builders of the codebase ==

Re: [ClusterLabs] Pacemaker 2.0.3-rc3 now available

2019-11-15 Thread Jan Pokorný
On 14/11/19 14:54 +0100, Jan Pokorný wrote: > On 13/11/19 17:30 -0600, Ken Gaillot wrote: >> A longstanding pain point in the logs has been improved. Whenever the >> scheduler processes resource history, it logs a warning for any >> failures it finds, regardless of whethe

[ClusterLabs] Documentation driven predictability of pacemaker commands (Was: Pacemaker 2.0.3-rc3 now available)

2019-11-15 Thread Jan Pokorný
On 14/11/19 10:16 -0600, Ken Gaillot wrote: > On Thu, 2019-11-14 at 14:54 +0100, Jan Pokorný wrote: >> On 13/11/19 17:30 -0600, Ken Gaillot wrote: >>> This fixes some more minor regressions in crm_mon introduced in >>> rc1. Additionally, after feedback from this list

Re: [ClusterLabs] Pacemaker 2.0.3-rc3 now available

2019-11-14 Thread Jan Pokorný
On 13/11/19 17:30 -0600, Ken Gaillot wrote: > This fixes some more minor regressions in crm_mon introduced in rc1. > Additionally, after feedback from this list, the new output format > options were shortened. The help is now: > > Output Options: > --output-as=FORMAT Specify output format

Re: [ClusterLabs] -INFINITY location constraint not honored?

2019-10-18 Thread Jan Pokorný
On 18/10/19 17:59 +0200, Jan Pokorný wrote: > On 18/10/19 11:21 +0300, Andrei Borzenkov wrote: >> According to it, you have symmetric cluster (and apparently made >> typo trying to change it) >> >> >> name="symmetric-cluster" value="

Re: [ClusterLabs] -INFINITY location constraint not honored?

2019-10-18 Thread Jan Pokorný
On 18/10/19 11:21 +0300, Andrei Borzenkov wrote: > According to it, you have symmetric cluster (and apparently made > typo trying to change it) > > > name="symmetric-cluster" value="true"/> > > name="symmectric-cluster" value="true"/> Great spot, demonstrating how

Re: [ClusterLabs] -INFINITY location constraint not honored?

2019-10-17 Thread Jan Pokorný
On 17/10/19 08:22 +0200, Raffaele Pantaleoni wrote: > I'm rather new to Pacemaker, I'm performing early tests on a set of > three virtual machines. > > I am configuring the cluster in the following way: > > 3 nodes configured > 4 resources configured > > Online: [ SRVDRSW01 SRVDRSW02 SRVDRSW03

Re: [ClusterLabs] Howto stonith in the case of any interface failure?

2019-10-09 Thread Jan Pokorný
On 09/10/19 09:58 +0200, Kadlecsik József wrote: > The nodes in our cluster have got backend and frontend interfaces: the > former ones are for the storage and cluster (corosync) traffic and the > latter ones are for the public services of KVM guests only. > > One of the nodes has got a failure

Re: [ClusterLabs] Ocassionally IPaddr2 resource fails to start

2019-10-07 Thread Jan Pokorný
Donat, On 07/10/19 09:24 -0500, Ken Gaillot wrote: > If this always happens when the VM is being snapshotted, you can put > the cluster in maintenance mode (or even unmanage just the IP > resource) while the snapshotting is happening. I don't know of any > reason why snapshotting would affect

Re: [ClusterLabs] Apache doesn't start under corosync with systemd

2019-10-07 Thread Jan Pokorný
On 07/10/19 10:27 -0500, Ken Gaillot wrote: > Additionally, your pacemaker configuration is using the apache OCF > script, so the cluster won't use /etc/init.d/apache2 at all (it invokes > the httpd binary directly). See my parallel response. > Keep in mind that the httpd monitor action requires

Re: [ClusterLabs] Apache doesn't start under corosync with systemd

2019-10-07 Thread Jan Pokorný
On 04/10/19 14:10 +, Reynolds, John F - San Mateo, CA - Contractor wrote: > I've just upgraded a two-node active-passive cluster from SLES11 to > SLES12. This means that I've gone from /etc/init.d scripts to > systemd services. > > On the SLES11 server, this worked: > >

Re: [ClusterLabs] Fence_sbd script in Fedora30?

2019-09-24 Thread Jan Pokorný
On 24/09/19 08:08 +0200, Klaus Wenninger wrote: > On 9/23/19 10:23 PM, Vitaly Zolotusky wrote: >> I am trying to upgrade to Fedora 30. The platform is two node >> cluster with pacemaker. >> It Fedora 28 we were using old fence_sbd script from 2013: >> >> # This STONITH script drives the

Re: [ClusterLabs] kronosnet v1.12 released

2019-09-20 Thread Jan Pokorný
On 20/09/19 05:22 +0200, Fabio M. Di Nitto wrote: > We are pleased to announce the general availability of kronosnet v1.12 > (bug fix release) > > [...] > > * Add support for musl libc Congrats, and the above is a great news, since I've been toying with an idea of putting together a truly

Re: [ClusterLabs] nfs-daemon will not start

2019-09-19 Thread Jan Pokorný
On 19/09/19 16:43 +0200, Oyvind Albrigtsen wrote: > Try upgrading resource-agents and maybe nfs-utils (if there's newer > version for CentOS 7). > > I recall some issue with how the nfs config was generated, which might > be causing this issue. This is what I'd start with, otherwise, see below.

Re: [ClusterLabs] Pacemaker 1.1.12 does not compile with CMAN Stack.

2019-09-16 Thread Jan Pokorný
ovided for you (below) still stand, don't expect any ClusterLabs rembranding-by-force of what practically amounts to a dead project now. Thanks for understanding. And keep in mind, if I were you, I'd skip CMAN and RHEL 6 today. > -Original Message- > From: Jan Pokorný > Sent: F

Re: [ClusterLabs] Why is last-lrm-refresh part of the CIB config?

2019-09-12 Thread Jan Pokorný
On 12/09/19 09:27 +0200, Ulrich Windl wrote: >>>> Jan Pokorný schrieb am 10.09.2019 um 17:38 >>>> in Nachricht <20190910153832.gj29...@redhat.com>: >> >> Just looking at how to provisionally satisfy the needs here, I can >> suggest this &quo

Re: [ClusterLabs] Why is last-lrm-refresh part of the CIB config?

2019-09-10 Thread Jan Pokorný
On 10/09/19 08:06 +0200, Ulrich Windl wrote: Ken Gaillot schrieb am 09.09.2019 um 17:14 in Nachricht <9e51c562c74e52c3b9e5f85576210bf83144fae7.ca...@redhat.com>: >> On Mon, 2019‑09‑09 at 11:06 +0200, Ulrich Windl wrote: >>> In recent pacemaker I see that last‑lrm‑refresh is

Re: [ClusterLabs] stonith-ng - performing action 'monitor' timed out with signal 15

2019-09-04 Thread Jan Pokorný
On 03/09/19 20:15 +0300, Andrei Borzenkov wrote: > 03.09.2019 11:09, Marco Marino пишет: >> Hi, I have a problem with fencing on a two node cluster. It seems that >> randomly the cluster cannot complete monitor operation for fence devices. >> In log I see: >> crmd[8206]: error: Result of monitor

Re: [ClusterLabs] Q: Recommened directory for RA auxillary files?

2019-09-02 Thread Jan Pokorný
On 02/09/19 15:23 +0200, Ulrich Windl wrote: > Are there any recommendations where to place (fixed content) files > an RA uses? > Usually my RAs use a separate XML file for the metadata, just to > allow editing it in XML mode automatically. Traditionally I put the > file in the same directory as

Re: [ClusterLabs] Pacemaker 1.1.12 does not compile with CMAN Stack.

2019-08-30 Thread Jan Pokorný
On 30/08/19 13:03 +, Somanath Jeeva wrote: > In Pacemaker 1.1.12 version try to compile with CMAN Stack, midly put, it's like trying to run with dinosaurs; that version of pacemaker together with that effectively superseded bundle of other components will hardly receive any attention in 2019.

Re: [ClusterLabs] Command to show location constraints?

2019-08-29 Thread Jan Pokorný
On 28/08/19 10:27 +0200, Ulrich Windl wrote: >>>> Jan Pokorný schrieb am 28.08.2019 um 10:03 >>>> in Nachricht <20190828080347.ga9...@redhat.com>: >> On 27/08/19 09:24 ‑0600, Casey & Gina wrote: >>> Hi, I'm looking for a way to show just locat

Re: [ClusterLabs] Command to show location constraints?

2019-08-28 Thread Jan Pokorný
On 27/08/19 09:24 -0600, Casey & Gina wrote: > Hi, I'm looking for a way to show just location constraints, if they > exist, for a cluster. I'm looking for the same data shown in the > output of `pcs config` under the "Location Constraints:" header, but > without all the rest, so that I can write

Re: [ClusterLabs] pacemaker resources under systemd

2019-08-27 Thread Jan Pokorný
On 27/08/19 15:27 +0200, Ulrich Windl wrote: > Systemd think he's the boss, doing what he wants: Today I noticed that all > resources are run inside control group "pacemaker.service" like this: > ├─pacemaker.service > │ ├─ 26582 isredir-ML1: listening on 172.20.17.238/12503 (2/1) > │ ├─

Re: [ClusterLabs] SLES12 SP4: Same error logged in two different formats

2019-08-26 Thread Jan Pokorný
On 26/08/19 08:16 +0200, Ulrich Windl wrote: > While inspecting the logs to improve my own RA, I noticed that one > error is logged with two different formats: With literal "\n" nd > with a space: > lrmd[7278]: notice: prm_isr_ds3_monitor_0:108007:stderr [ mkdir: cannot > create directory

Re: [ClusterLabs] Q: "pengine[7280]: error: Characters left over after parsing '10#012': '#012'"

2019-08-22 Thread Jan Pokorný
On 22/08/19 08:07 +0200, Ulrich Windl wrote: > When a second node joined a two-node cluster, I noticed the > following error message that leaves me kind of clueless: > pengine[7280]:error: Characters left over after parsing '10#012': '#012' > > Where should I look for these characters?

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-21 Thread Jan Pokorný
On 21/08/19 14:48 +0200, Jan Pokorný wrote: > On 20/08/19 20:55 +0200, Jan Pokorný wrote: >> On 15/08/19 17:03 +, Michael Powell wrote: >>> First, thanks to all for their responses. With your help, I'm >>> steadily gaining competence WRT HA, albeit slowly. >&

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-21 Thread Jan Pokorný
On 20/08/19 20:55 +0200, Jan Pokorný wrote: > On 15/08/19 17:03 +, Michael Powell wrote: >> First, thanks to all for their responses. With your help, I'm >> steadily gaining competence WRT HA, albeit slowly. >> >> I've basically followed Harvey's workaround sugges

Re: [ClusterLabs] booth (manual tickets) site/peer notifications

2019-08-21 Thread Jan Pokorný
On 21/08/19 14:49 +0530, Rohit Saini wrote: > I am using booth (two clusters) using manual ticket. Is there any > way for cluster-1 to know if peer cluster-2 booth-ip is not > reachable. Here I am looking to know if there both the sites are > connected and reachable via each other's booth-ip.

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-20 Thread Jan Pokorný
On 15/08/19 17:03 +, Michael Powell wrote: > First, thanks to all for their responses. With your help, I'm > steadily gaining competence WRT HA, albeit slowly. > > I've basically followed Harvey's workaround suggestion, and the > failover I hoped for takes effect quite quickly. I

Re: [ClusterLabs] [Announce] clufter v0.77.2 released

2019-08-15 Thread Jan Pokorný
On 14/08/19 16:26 +0200, Jan Pokorný wrote: > I am happy to announce that clufter, a tool/library for transforming > and analyzing cluster configuration formats, got its version 0.77.2 > tagged and released (incl. signature using my 60BCBB4F5CD7F9EF key): > <https://pagure.io/r

Re: [ClusterLabs] cloned ethmonitor - upon failure of all nodes

2019-08-15 Thread Jan Pokorný
On 15/08/19 10:59 +0100, solarmon wrote: > I have a two node cluster setup where each node is multi-homed over two > separate external interfaces - net4 and net5 - that can have traffic load > balanced between them. > > I have created multiple virtual ip resources (grouped together) that should >

[ClusterLabs] [Announce] clufter v0.77.2 released

2019-08-14 Thread Jan Pokorný
I am happy to announce that clufter, a tool/library for transforming and analyzing cluster configuration formats, got its version 0.77.2 tagged and released (incl. signature using my 60BCBB4F5CD7F9EF key):

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-13 Thread Jan Pokorný
On 13/08/19 09:44 +0200, Ulrich Windl wrote: Harvey Shepherd schrieb am 12.08.2019 um 23:38 > in Nachricht : >> I've been experiencing exactly the same issue. Pacemaker prioritises >> restarting the failed resource over maintaining a master instance. In my >> case >> I used

Re: [ClusterLabs] Restoring network connection breaks cluster services

2019-08-12 Thread Jan Pokorný
On 07/08/19 16:06 +0200, Momcilo Medic wrote: > On Wed, Aug 7, 2019 at 1:00 PM Klaus Wenninger wrote: > >> On 8/7/19 12:26 PM, Momcilo Medic wrote: >> >>> We have three node cluster that is setup to stop resources on lost >>> quorum. Failure (network going down) handling is done properly, >>>

Re: [ClusterLabs] how to connect to the cluster from a docker container

2019-08-06 Thread Jan Pokorný
On 06/08/19 13:36 +0200, Jan Pokorný wrote: > On 06/08/19 10:37 +0200, Dejan Muhamedagic wrote: >> Hawk runs in a docker container on one of the cluster nodes (the >> nodes run Debian and apparently it's rather difficult to install >> hawk on a non-SUSE distribution, he

Re: [ClusterLabs] how to connect to the cluster from a docker container

2019-08-06 Thread Jan Pokorný
Hello Dejan, nice to see you around, On 06/08/19 10:37 +0200, Dejan Muhamedagic wrote: > Hawk runs in a docker container on one of the cluster nodes (the > nodes run Debian and apparently it's rather difficult to install > hawk on a non-SUSE distribution, hence docker). Now, how to > connect to

Re: [ClusterLabs] Feedback wanted: Node reaction to fabric fencing

2019-07-25 Thread Jan Pokorný
On 24/07/19 12:33 -0500, Ken Gaillot wrote: > A recent bugfix (clbz#5386) brings up a question. > > A node may receive notification of its own fencing when fencing is > misconfigured (for example, an APC switch with the wrong plug number) > or when fabric fencing is used that doesn't cut the

Re: [ClusterLabs] Which shell definitions to include?

2019-07-23 Thread Jan Pokorný
On 23/07/19 09:07 -0500, Ken Gaillot wrote: > On Tue, 2019-07-23 at 08:48 +0200, Ulrich Windl wrote: >> It's not that I like all the changes systemd requires, but systemd >> complains about not being able to unmount /var while /var/run or >> /var/lock is being used... > > Agreed, it should be a

Re: [ClusterLabs] [Question] About clufter's Corosync 3 support

2019-07-18 Thread Jan Pokorný
On 18/07/19 18:08 +0900, 井上和徳 wrote: > 'pcs config export' fails in RHEL 8.0 environment because clufter > does not support Corosync 3 (Camelback). > How is the state of progress? https://pagure.io/clufter/issue/4 As much as I'd want to, time budget hasn't been allocated for this, pacemaker and

Re: [ClusterLabs] pacemaker alerts list

2019-07-18 Thread Jan Pokorný
On 17/07/19 19:07 +, Gershman, Vladimir wrote: > This would be for the Pacemaker. > > Seems like the alerts in the link you sent, refer to numeric codes, > so where would I see all the codes and their meanings ? This would > allow a way to select what I need to monitor. Unfortunately, we

[ClusterLabs] Question about associating with ClusterLabs wrt. a local community (Was: I have a question.)

2019-07-09 Thread Jan Pokorný
Hello Kim, On 08/07/19 13:11 +0900, 김동현 wrote: > I'm Donghyun Kim. > > I work as a system engineer in Korea. > > In the meantime, I was very interested in the cluster and want > to promote it in Korea. > There are many high-availability cases in Linux systems. > > The reason why I am sending

Re: [ClusterLabs] Strange monitor return code log for LSB resource

2019-07-08 Thread Jan Pokorný
On 05/07/19 03:50 +, Harvey Shepherd wrote: > I was able to resolve an issue which caused these logs to disappear. > The problem was that the LSB script was named "logging" and the > daemon that it controlled was also called "logging". The init script > uses "start-stop-daemon --name" to start

Re: [ClusterLabs] Strange monitor return code log for LSB resource

2019-07-04 Thread Jan Pokorný
Harvey, On 25/06/19 21:26 +, Harvey Shepherd wrote: > Thanks for your reply Andrei. There is no external monitoring > software running. The logs that I posted are from the pacemaker log > with debug enabled. that's exactly what's questionable here -- how can crm_resource invocation be at all

Re: [ClusterLabs] Antw: Re: Two node cluster goes into split brain scenario during CPU intensive tasks

2019-07-01 Thread Jan Pokorný
On 01/07/19 13:26 +0200, Ulrich Windl wrote: >>>> Jan Pokorný schrieb am 27.06.2019 um 12:02 >>>> in Nachricht <20190627100209.gf31...@redhat.com>: >> On 25/06/19 12:20 ‑0500, Ken Gaillot wrote: >>> On Tue, 2019‑06‑25 at 11:06 +, Somanath Jeeva wr

Re: [ClusterLabs] Two node cluster goes into split brain scenario during CPU intensive tasks

2019-06-27 Thread Jan Pokorný
On 25/06/19 12:20 -0500, Ken Gaillot wrote: > On Tue, 2019-06-25 at 11:06 +, Somanath Jeeva wrote: > Addressing the root cause, I'd first make sure corosync is running at > real-time priority (I forget the ps option, hopefully someone else can > chime in). In a standard Linux environment, I

Re: [ClusterLabs] Fence agent definition under Centos7.6

2019-06-20 Thread Jan Pokorný
On 18/06/19 13:08 +, Michael Powell wrote: > * The document does not specify where the agent should be installed. > (On /usr/sbin.) This could give a precise answer: https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.2/lib/fencing/st_rhcs.c#L33 > * The document does not mention

[ClusterLabs] CVE-2019-12779 assignment for libqb (Was: [Announce] libqb 1.0.4/1.0.5 release)

2019-06-10 Thread Jan Pokorný
On 15/04/19 14:56 +0100, Christine Caulfield wrote: > We are pleased to announce the release of libqb 1.0.4 > > Source code is available at: > https://github.com/ClusterLabs/libqb/releases/download/v1.0.4/libqb-1.0.4.tar.xz > > Please use the signed .tar.gz or .tar.xz files with the version

Re: [ClusterLabs] VirtualDomain and Resource_is_Too_Active ?? - problem/error

2019-06-04 Thread Jan Pokorný
On 03/06/19 13:39 +0200, Jan Pokorný wrote: > Yes, there are at least two issues in ocf:heartbeat:VirtualDomain: > > 1/ dealing with user input derived value, in an unchecked manner, while >such value can be an empty string or may contain spaces >(for the latter, see also I

Re: [ClusterLabs] VirtualDomain and Resource_is_Too_Active ?? - problem/error

2019-06-03 Thread Jan Pokorný
On 29/05/19 09:29 -0500, Ken Gaillot wrote: > On Wed, 2019-05-29 at 11:42 +0100, lejeczek wrote: >> I doing something which I believe is fairly simple, namely: >> >> $ pcs resource create HA-work9-win10-kvm VirtualDomain \ >> hypervisor="qemu:///system" \ >>

[ClusterLabs] Inconclusive recap for bonding (balance-rr) vs. HA (Was: why is node fenced ?)

2019-05-30 Thread Jan Pokorný
On 20/05/19 14:35 +0200, Jan Pokorný wrote: > On 20/05/19 08:28 +0200, Ulrich Windl wrote: >>> One network interface is gone for a short period. But it's in a >>> bonding device (round-robin), so the connection shouldn't be lost. >>> Both nodes are connected directly,

[ClusterLabs] Pacemaker detecting existing processes question (Was: indirectly related - pacemaker service)

2019-05-30 Thread Jan Pokorný
[forwarding to respective upstream list, this has little to do with systemd, I suggest following up only there, detaching from systemd ML] On 29/05/19 17:23 +0100, lejeczek wrote: > something I was hoping one expert could shed bit more light onto - I > have a pacemaker cluster composed of three

Re: [ClusterLabs] info: mcp_cpg_deliver: Ignoring process list sent by peer for local node

2019-05-30 Thread Jan Pokorný
On 30/05/19 11:01 +0100, lejeczek wrote: > On 29/05/2019 21:04, Ken Gaillot wrote: >> On Wed, 2019-05-29 at 17:28 +0100, lejeczek wrote: >>> and: >>> $ systemctl status -l pacemaker.service >>> ● pacemaker.service - Pacemaker High Availability Cluster Manager >>>Loaded: loaded

Re: [ClusterLabs] Antw: why is node fenced ?

2019-05-20 Thread Jan Pokorný
On 20/05/19 08:28 +0200, Ulrich Windl wrote: >> One network interface is gone for a short period. But it's in a >> bonding device (round-robin), so the connection shouldn't be lost. >> Both nodes are connected directly, there is no switch in between. > > I think you misunderstood: a round-robin

Re: [ClusterLabs] why is node fenced ?

2019-05-17 Thread Jan Pokorný
On 16/05/19 17:10 +0200, Lentes, Bernd wrote: > my HA-Cluster with two nodes fenced one on 14th of may. > ha-idg-1 has been the DC, ha-idg-2 was fenced. > It happened around 11:30 am. > The log from the fenced one isn't really informative: > > [...] > > Node restarts at 11:44 am. > The DC is

[ClusterLabs] Recurring troubles with the weak grip on processes (Was: Timeout stopping corosync-qdevice service)

2019-05-02 Thread Jan Pokorný
On 30/04/19 20:39 +0300, Andrei Borzenkov wrote: > 30.04.2019 9:51, Jan Friesse пишет: >> >>> Now, corosync-qdevice gets SIGTERM as "signal to terminate", but it >>> installs SIGTERM handler that does not exit and only closes some socket. >>> May be this should trigger termination of main loop,

[ClusterLabs] Multiple processes appending to the same log file questions (Was: Pacemaker detail log directory permissions)

2019-04-30 Thread Jan Pokorný
[let's move this to developers@cl.o, please drop users on response unless you are only subscribed there, I tend to only respond to the lists] On 30/04/19 13:55 +0200, Jan Pokorný wrote: > On 30/04/19 07:55 +0200, Ulrich Windl wrote: >>>>> Jan Pokorný schrieb am 2

Re: [ClusterLabs] Pacemaker detail log directory permissions

2019-04-30 Thread Jan Pokorný
On 30/04/19 07:55 +0200, Ulrich Windl wrote: >>>> Jan Pokorný schrieb am 29.04.2019 um 17:22 >>>> in Nachricht <20190429152200.ga19...@redhat.com>: >> On 29/04/19 14:58 +0200, Jan Pokorný wrote: >>> On 29/04/19 08:20 +0200, Ulrich Windl wrote: >&g

Re: [ClusterLabs] Pacemaker detail log directory permissions

2019-04-29 Thread Jan Pokorný
On 29/04/19 14:58 +0200, Jan Pokorný wrote: > On 29/04/19 08:20 +0200, Ulrich Windl wrote: >>>>> Jan Pokorný schrieb am 25.04.2019 um 18:49 >>>>> in Nachricht <20190425164946.gf23...@redhat.com>: >>> I think the prime and foremost use case is t

Re: [ClusterLabs] Pacemaker detail log directory permissions

2019-04-29 Thread Jan Pokorný
On 29/04/19 08:20 +0200, Ulrich Windl wrote: >>>> Jan Pokorný schrieb am 25.04.2019 um 18:49 >>>> in Nachricht <20190425164946.gf23...@redhat.com>: >> On 24/04/19 09:32 ‑0500, Ken Gaillot wrote: >>> On Wed, 2019‑04‑24 at 16:08 +0200, wf...@niif

Re: [ClusterLabs] [Announce] libqb 1.0.5 release

2019-04-26 Thread Jan Pokorný
On 26/04/19 07:58 +0100, Christine Caulfield wrote: > We are pleased to announce the release of libqb 1.0.5 > > Source code is available at: > https://github.com/ClusterLabs/libqb/releases/download/v1.0.5/libqb-1.0.5.tar.xz > > Please used the signed .tar.gz or .tar.xz files with the version

Re: [ClusterLabs] Pacemaker detail log directory permissions

2019-04-25 Thread Jan Pokorný
On 24/04/19 09:32 -0500, Ken Gaillot wrote: > On Wed, 2019-04-24 at 16:08 +0200, wf...@niif.hu wrote: >> Make install creates /var/log/pacemaker with mode 0770, owned by >> hacluster:haclient. However, if I create the directory as root:root >> instead, pacemaker.log appears as hacluster:haclient

Re: [ClusterLabs] Coming in 2.0.2: check whether a date-based rule is expired

2019-04-24 Thread Jan Pokorný
that pacemaker already started the tradition of pioneering some innovative time expresssions, so don't say it too loudly: $ iso8601 -d epoch > Date: 1970-01-01 00:00:00Z Mostly as a follow-up for the above joke, please don't _ever_ refer to "epoch" as a date/time expression, it may go

Re: [ClusterLabs] Coming in 2.0.2: check whether a date-based rule is expired

2019-04-23 Thread Jan Pokorný
On 16/04/19 12:38 -0500, Ken Gaillot wrote: > We are adding a "crm_rule" command Wouldn't `pcmk-rule` be a more sensible command name -- I mean, why not to benefit from not suffering the historical burden in this case, given that `crm` in the broadest "almost anything that can be associated with

Re: [ClusterLabs] Pacemaker security issues discovered and patched

2019-04-17 Thread Jan Pokorný
On 17/04/19 12:09 -0500, Ken Gaillot wrote: > Without the patches, a mitigation is to prevent local user access to > cluster nodes except for cluster administrators (which is the > recommended and most common deployment model). Not trying to artificially amplify the risk in response to the above,

Re: [ClusterLabs] Resource not starting correctly IV

2019-04-16 Thread Jan Pokorný
[letter-casing wise: it's either "Pacemaker" or down-to-the-terminal "pacemaker"] On 16/04/19 10:21 -0600, JCA wrote: > 2. It would seem that what Pacemaker is doing is the following: >a. Check out whether the app is running. >b. If it is not, launch it. >c. Check out again >d.

Re: [ClusterLabs] Resource not starting correctly III

2019-04-16 Thread Jan Pokorný
On 15/04/19 16:01 -0600, JCA wrote: > This is weird. Further experiments, consisting of creating and deleting the > resource, reveal that, on creating the resource, myapp-script may be > invoked multiple times - sometimes four, sometimes twenty or so, sometimes > returning OCF_SUCCESS, some other

[ClusterLabs] RRP is broken ... in what ways? (Was: corosync caused network breakdown)

2019-04-10 Thread Jan Pokorný
On 08/04/19 19:08 +0200, Jan Friesse wrote: > Anyway. RRP is broken, Since this is repeatedly rehashed on this list without summarizing *what* is that much broken[1] while serving (rather well, I think) last 8+ years, perhaps it would deserve some wider dissemination incl. *why* is that. Closest

[ClusterLabs] Community's communication sustainability (Was: recommendations for corosync totem timeout for CentOS 7 + VMware?)

2019-03-22 Thread Jan Pokorný
On 22/03/19 15:02 +0300, Andrei Borzenkov wrote: > On Fri, Mar 22, 2019 at 1:08 PM Jan Pokorný wrote: >> Also a Friday's idea: >> Perhaps we should crank up "how to ask" manual for this list > > Yest another one? > > http://www.catb.org/~esr/faqs/sma

Re: [ClusterLabs] recommendations for corosync totem timeout for CentOS 7 + VMware?

2019-03-22 Thread Jan Pokorný
On 21/03/19 12:21 -0400, Brian Reichert wrote: > I've followed several tutorials about setting up a simple three-node > cluster, with no resources (yet), under CentOS 7. > > I've discovered the cluster won't restart upon rebooting a node. > > The other two nodes, however, do claim the cluster is

Re: [ClusterLabs] FYI: clusterlabs.org server maintenance

2019-03-11 Thread Jan Pokorný
On 11/03/19 14:08 -0500, Ken Gaillot wrote: > Both issues turned out to be SELinux-related. I believe everything is > working properly again. Thanks for looking into that (beside the long term maintenance, important nonetheless), will keep an eye on recidivating of those. -- Jan (Poki)

Re: [ClusterLabs] FYI: clusterlabs.org server maintenance

2019-03-11 Thread Jan Pokorný
On 08/03/19 15:24 -0600, Ken Gaillot wrote: > I plan to move the bugzilla and wiki sites over the weekend, and > everything else sometime in the week after that. To provide feedback: * wiki: seems really slow now * bugzilla: at standard responsive speed, but when adding a comment: > An

Re: [ClusterLabs] can't start pacemaker resources

2019-02-22 Thread Jan Pokorný
On 21/02/19 18:15 -0800, solarflow99 wrote: > I have tried to create NFS shares using the ceph backend, but I can't seem > to get the resources to start. It doesn't show me much as to why, does > anyone have an idea? > > [...] > > Feb 21 15:11:42 cephmgr101.corp.mydomain.com

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Jan Pokorný
On 20/02/19 21:16 +0100, Klaus Wenninger wrote: > On 02/20/2019 08:51 PM, Jan Pokorný wrote: >> On 20/02/19 17:37 +, Edwin Török wrote: >>> strace for the situation described below (corosync 95%, 1 vCPU): >>> https://clbin.com/hZL5z >> I might have missed that

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Jan Pokorný
On 20/02/19 21:25 +0100, Klaus Wenninger wrote: > Hmm maybe the thing that should be scheduled is running at > SCHED_RR as well but with just a lower prio. So it wouldn't > profit from the sched_yield and it wouldn't get anything of > the 5% either. Actually, it would possibly make the situation

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Jan Pokorný
On 20/02/19 17:37 +, Edwin Török wrote: > strace for the situation described below (corosync 95%, 1 vCPU): > https://clbin.com/hZL5z I might have missed that earlier or this may be just some sort of insignificant/misleading clue: > strace: Process 4923 attached with 2 threads > strace: [

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Jan Pokorný
On 19/02/19 16:41 +, Edwin Török wrote: > Also noticed this: > [ 5390.361861] crmd[12620]: segfault at 0 ip 7f221c5e03b1 sp > 7ffcf9cf9d88 error 4 in libc-2.17.so[7f221c554000+1c2000] > [ 5390.361918] Code: b8 00 00 00 04 00 00 00 74 07 48 8d 05 f8 f2 0d 00 > c3 0f 1f 80 00 00 00 00 48

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-18 Thread Jan Pokorný
On 15/02/19 08:48 +0100, Jan Friesse wrote: > Ulrich Windl napsal(a): >> IMHO any process running at real-time priorities must make sure >> that it consumes the CPU only for short moment that are really >> critical to be performed in time. Pardon me, Ulrich, but something is off about this,

Re: [ClusterLabs] Pacemaker log showing time mismatch after

2019-02-11 Thread Jan Pokorný
On 11/02/19 15:03 -0600, Ken Gaillot wrote: > On Fri, 2019-02-01 at 08:10 +0100, Jan Pokorný wrote: >> On 28/01/19 09:47 -0600, Ken Gaillot wrote: >>> On Mon, 2019-01-28 at 18:04 +0530, Dileep V Nair wrote: >>> Pacemaker can handle the clock jumping forward, but not b

Re: [ClusterLabs] [pacemaker] Discretion with glib v2.59.0+ recommended

2019-02-11 Thread Jan Pokorný
On 20/01/19 12:44 +0100, Jan Pokorný wrote: > On 18/01/19 20:32 +0100, Jan Pokorný wrote: >> It was discovered that this release of glib project changed sligthly >> some parameters of how distribution of values within hash tables >> structures work, undermining pacemaker's

Re: [ClusterLabs] Pacemaker log showing time mismatch after

2019-01-31 Thread Jan Pokorný
On 28/01/19 09:47 -0600, Ken Gaillot wrote: > On Mon, 2019-01-28 at 18:04 +0530, Dileep V Nair wrote: > Pacemaker can handle the clock jumping forward, but not backward. I am rather surprised, are we not using monotonic time only, then? If so, why? We shall not need any explicit time

  1   2   3   4   >