Re: [ClusterLabs] Single quotes in values for 'crm resource rsc param set'

2015-08-17 Thread Vladislav Bogdanov
17.08.2015 10:39, Kristoffer Grönlund wrote: Vladislav Bogdanov bub...@hoster-ok.com writes: Hi Kristoffer, all. Could you please look why I get error when trying to update valid resource value (which already has single quotes inside) with the slightly different one by running the command

Re: [ClusterLabs] Single quotes in values for 'crm resource rsc param set'

2015-08-17 Thread Vladislav Bogdanov
14.08.2015 19:51, Jan Pokorný wrote: On 14/08/15 18:22 +0300, Vladislav Bogdanov wrote: I need to pass a command-line with semicolons in one of parameters which is run with eval in the resource agent. Backslashed double-quoting does not work in this case, but single-quotes work fine. Hmm

Re: [ClusterLabs] Antw: Re: Single quotes in values for 'crm resource rsc param set'

2015-08-17 Thread Vladislav Bogdanov
a good point. I will think about it, thanks. Regards, Ulrich Vladislav Bogdanov bub...@hoster-ok.com schrieb am 17.08.2015 um 11:22 in Nachricht 55d1a7d9.20...@hoster-ok.com: 17.08.2015 10:39, Kristoffer Grönlund wrote: Vladislav Bogdanov bub...@hoster-ok.com writes: Hi Kristoffer, all

[ClusterLabs] Single quotes in values for 'crm resource rsc param set'

2015-08-14 Thread Vladislav Bogdanov
Hi Kristoffer, all. Could you please look why I get error when trying to update valid resource value (which already has single quotes inside) with the slightly different one by running the command in the subject? It looks like is_value_sane() doesn't accept single quotes just because crmsh

[ClusterLabs] node attributes go to different instance_attributes sections in crmsh

2015-08-06 Thread Vladislav Bogdanov
Hi, following illustrates what happens with 'crm configure show' output after playing with 'crm node standby|online' having some node attributes already set from the loaded config. xml node id=1 uname=dell71 \ instance_attributes id=dell71-instance_attributes \ nvpair

Re: [ClusterLabs] Failover to spare node

2015-10-22 Thread Vladislav Bogdanov
22.10.2015 19:49, Andrei Borzenkov wrote: Let's say I have a pool of nodes and multiple services, somehow distributed across them. I would like to keep one node as "spare", without services by default, and if any of "worker" nodes fail, services that were running there should be relocated to

[ClusterLabs] attrd: Fix sigsegv on exit if initialization failed

2015-10-12 Thread Vladislav Bogdanov
Hi, This was caught with 0.17.1 libqb, which didn't play well with long pids. commit 180a943846b6d94c27b9b984b039ac0465df64da Author: Vladislav Bogdanov <bub...@hoster-ok.com> Date: Mon Oct 12 11:05:29 2015 + attrd: Fix sigsegv on exit if initialization failed diff --git a

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Antw: pacemaker doesn't correctly handle a resource after time/date change

2015-08-28 Thread Vladislav Bogdanov
28.08.2015 12:25, Kostiantyn Ponomarenko wrote: In my case the final solution will be shipped to different counties which means different time zones. Why not to keep all HW clocks in UTC? And the replacement of one of the nodes in the working solution could happens. So the possibilities of

[ClusterLabs] crm_report consumes all available RAM

2015-09-08 Thread Vladislav Bogdanov
Hi, just discovered very interesting issue. If there is a system user with very big UID (8002 in my case), then crm_report (actually 'grep' it runs) consumes too much RAM. Relevant part of the process tree at that moment looks like (word-wrap off): USER PID %CPU %MEMVSZ RSS TTY

Re: [ClusterLabs] Antw: crm_report consumes all available RAM

2015-09-08 Thread Vladislav Bogdanov
08.09.2015 15:18, Ulrich Windl wrote: Vladislav Bogdanov <bub...@hoster-ok.com> schrieb am 08.09.2015 um 14:05 in Nachricht <55eecefb.8050...@hoster-ok.com>: Hi, just discovered very interesting issue. If there is a system user with very big UID (8002 in my case), the

Re: [ClusterLabs] Clustered LVM with iptables issue

2015-09-11 Thread Vladislav Bogdanov
Hi Digimer, Be aware that SCTP support in both kernel and DLM _may_ have issues (as long as I remember it was not recommended to use at least in cman's version of DLM at least because of the leak of testing). I believe you can force use of TCP via dlm_controld parameters (or config

Re: [ClusterLabs] Antw: Need bash instead of /bin/sh

2015-09-23 Thread Vladislav Bogdanov
23.09.2015 15:42, dan wrote: ons 2015-09-23 klockan 14:08 +0200 skrev Ulrich Windl: dan schrieb am 23.09.2015 um 13:39 in Nachricht <1443008370.2386.8.ca...@intraphone.com>: Hi As I had problem with corosync 2.3.3 and pacemaker 1.1.10 which was default in my

Re: [ClusterLabs] [OCF] Pacemaker reports a multi-state clone resource instance as running while it is not in fact

2015-12-31 Thread Vladislav Bogdanov
31.12.2015 12:57:45 CET, Bogdan Dobrelya wrote: >Hello. >I've been hopelessly fighting a bug [0] in the custom OCF agent of Fuel >for OpenStack project. It is related to the destructive test case when >one node of 3 or 5 total goes down and then back. The bug itself is

Re: [ClusterLabs] [OCF] Pacemaker reports a multi-state clone resource instance as running while it is not in fact

2016-01-01 Thread Vladislav Bogdanov
31.12.2015 15:33:45 CET, Bogdan Dobrelya <bdobre...@mirantis.com> wrote: >On 31.12.2015 14:48, Vladislav Bogdanov wrote: >> blackbox tracing inside pacemaker, USR1, USR2 and TRAP signals iirc, >quick google search should point you to Andrew's blog with all >information about

Re: [ClusterLabs] mail server (postfix)

2016-06-04 Thread Vladislav Bogdanov
3 .6.2016 г. 20:33:01 GMT+03:00, Dimitri Maziuk wrote Sorry for top-post. I'd modify RA to support master/slave concept. I use the same approach to manage cyrus-imapd replicas, passing them different pre-installed config files, depending on operation, start, promote, or

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Vladislav Bogdanov
07.06.2016 02:20, Ken Gaillot wrote: On 06/06/2016 03:30 PM, Vladislav Bogdanov wrote: 06.06.2016 22:43, Ken Gaillot wrote: On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote: 06.06.2016 19:39, Ken Gaillot wrote: On 06/05/2016 07:27 PM, Andrew Beekhof wrote: On Sat, Jun 4, 2016 at 12:16 AM

Re: [ClusterLabs] mail server (postfix)

2016-06-06 Thread Vladislav Bogdanov
05.06.2016 22:22, Dimitri Maziuk wrote: On 06/04/2016 01:02 PM, Vladislav Bogdanov wrote: I'd modify RA to support master/slave concept. I'm assuming you use a shared mail store on your imapd cluster? I want No, I use cyrus internal replication. to host the storage on the same cluster

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Vladislav Bogdanov
06.06.2016 19:39, Ken Gaillot wrote: On 06/05/2016 07:27 PM, Andrew Beekhof wrote: On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot wrote: On 06/02/2016 08:01 PM, Andrew Beekhof wrote: On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot wrote: A recent thread

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Vladislav Bogdanov
06.06.2016 22:43, Ken Gaillot wrote: On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote: 06.06.2016 19:39, Ken Gaillot wrote: On 06/05/2016 07:27 PM, Andrew Beekhof wrote: On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot <kgail...@redhat.com> wrote: On 06/02/2016 08:01 PM, Andrew Beekhof

Re: [ClusterLabs] Different pacemaker versions split cluster

2016-06-06 Thread Vladislav Bogdanov
06.06.2016 23:28, Ken Gaillot wrote: On 05/30/2016 01:14 PM, DacioMF wrote: Hi, I had 4 nodes with Ubuntu 14.04LTS in my cluster and all of then worked well. I need upgrade all my cluster nodes to Ubuntu 16.04LTS without stop my resources. Two nodes have been updated to 16.04 and the two

Re: [ClusterLabs] Corosync 2.3.6 is available at corosync.org!

2016-06-16 Thread Vladislav Bogdanov
16.06.2016 15:28, Christine Caulfield wrote: On 16/06/16 13:22, Vladislav Bogdanov wrote: Hi, 16.06.2016 14:09, Jan Friesse wrote: I am pleased to announce the latest maintenance release of Corosync 2.3.6 available immediately from our website at http://build.clusterlabs.org/corosync/releases

Re: [ClusterLabs] Corosync 2.3.6 is available at corosync.org!

2016-06-16 Thread Vladislav Bogdanov
Hi, 16.06.2016 14:09, Jan Friesse wrote: I am pleased to announce the latest maintenance release of Corosync 2.3.6 available immediately from our website at http://build.clusterlabs.org/corosync/releases/. [...] Christine Caulfield (9): [...] Add some more RO keys Is there a strong

Re: [ClusterLabs] Corosync 2.3.6 is available at corosync.org!

2016-06-16 Thread Vladislav Bogdanov
16.06.2016 16:04, Christine Caulfield wrote: On 16/06/16 13:54, Vladislav Bogdanov wrote: 16.06.2016 15:28, Christine Caulfield wrote: On 16/06/16 13:22, Vladislav Bogdanov wrote: Hi, 16.06.2016 14:09, Jan Friesse wrote: I am pleased to announce the latest maintenance release of Corosync

Re: [ClusterLabs] Node is silently unfenced if transition is very long

2016-06-17 Thread Vladislav Bogdanov
17.06.2016 15:05, Vladislav Bogdanov wrote: 03.05.2016 01:14, Ken Gaillot wrote: On 04/19/2016 10:47 AM, Vladislav Bogdanov wrote: Hi, Just found an issue with node is silently unfenced. That is quite large setup (2 cluster nodes and 8 remote ones) with a plenty of slowly starting resources

Re: [ClusterLabs] Node is silently unfenced if transition is very long

2016-06-17 Thread Vladislav Bogdanov
03.05.2016 01:14, Ken Gaillot wrote: On 04/19/2016 10:47 AM, Vladislav Bogdanov wrote: Hi, Just found an issue with node is silently unfenced. That is quite large setup (2 cluster nodes and 8 remote ones) with a plenty of slowly starting resources (lustre filesystem). Fencing was initiated

[ClusterLabs] crmsh configure delete for constraints

2016-02-08 Thread Vladislav Bogdanov
Hi, when performing a delete operation, crmsh (2.2.0) having -F tries to stop passed op arguments and then waits for DC to become idle. That is not needed if only constraints are passed to delete. Could that be changed? Or, could it wait only if there is something to stop? Something like this:

Re: [ClusterLabs] Antw: Re: DLM fencing

2016-02-11 Thread Vladislav Bogdanov
10.02.2016 19:32, Digimer wrote: [snip] To be clear; DLM does NOT have it's own fencing. It relies on the cluster's fencing. Actually, dlm4 can use fence-agents directly (device keyword in dlm.conf). Default is to use dlm_stonith though. ___

Re: [ClusterLabs] Antw: Re: crmsh configure delete for constraints

2016-02-10 Thread Vladislav Bogdanov
10.02.2016 11:38, Ulrich Windl wrote: Vladislav Bogdanov <bub...@hoster-ok.com> schrieb am 10.02.2016 um 05:39 in Nachricht <6e479808-6362-4932-b2c6-348c7efc4...@hoster-ok.com>: [...] Well, I'd reword. Generally, RA should not exit with error if validation fails on stop. Is

Re: [ClusterLabs] crmsh configure delete for constraints

2016-02-09 Thread Vladislav Bogdanov
Dejan Muhamedagic <deja...@fastmail.fm> wrote: >Hi, > >On Tue, Feb 09, 2016 at 05:15:15PM +0300, Vladislav Bogdanov wrote: >> 09.02.2016 16:31, Kristoffer Grönlund wrote: >> >Vladislav Bogdanov <bub...@hoster-ok.com> writes: >> > >> >>Hi, &g

Re: [ClusterLabs] Antw: Re: crmsh configure delete for constraints

2016-02-10 Thread Vladislav Bogdanov
10.02.2016 13:56, Ferenc Wágner wrote: Vladislav Bogdanov <bub...@hoster-ok.com> writes: If pacemaker has got an error on start, it will run stop with the same set of parameters anyways. And will get error again if that one was from validation and RA does not differentiate validation for

Re: [ClusterLabs] GFS2 with Pacemaker, Corosync on Ubuntu 14.04

2016-01-19 Thread Vladislav Bogdanov
19.01.2016 18:14, Momcilo Medic wrote: Dear all, I am trying to setup GFS2 on two Ubuntu 14.04 servers. Every guide I can find online is for 12.04 by using cman package which was abandoned in 13.10 So, I tried using Pacemaker with Corosync as instructed on your guide [1]. In this guide pcs is

[ClusterLabs] dlm_controld 4.0.4 exits when crmd is fencing another node

2016-01-22 Thread Vladislav Bogdanov
Hi David, list, recently I tried to upgrade dlm from 4.0.2 to 4.0.4 and found that it no longer handles fencing of a remote node initiated by other cluster components. First I noticed that during valid fencing due to resource stop failure, but it is easily reproduced with 'crm node fence XXX'.

Re: [ClusterLabs] dlm_controld 4.0.4 exits when crmd is fencing another node

2016-01-22 Thread Vladislav Bogdanov
22.01.2016 19:28, David Teigland wrote: On Fri, Jan 22, 2016 at 06:59:25PM +0300, Vladislav Bogdanov wrote: Hi David, list, recently I tried to upgrade dlm from 4.0.2 to 4.0.4 and found that it no longer handles fencing of a remote node initiated by other cluster components. First I noticed

Re: [ClusterLabs] attrd does not clean per-node cache after node removal

2016-03-23 Thread Vladislav Bogdanov
23.03.2016 19:39, Ken Gaillot wrote: On 03/23/2016 07:35 AM, Vladislav Bogdanov wrote: Hi! It seems like atomic attrd in post-1.1.14 (eb89393) does not fully clean node cache after node is removed. Is this a regression? Or have you only tried it with this version? Only with this one

Re: [ClusterLabs] Approach to validate on stop op (Was Re: crmsh configure delete for constraints)

2016-03-29 Thread Vladislav Bogdanov
29.03.2016 15:28, Vladislav Bogdanov wrote: [...] *) # monitor | notify | reload | etc validate ret=$? if [ ${ret} -ne $OCF_SUCCESS ] ; then if ocf_is_probe ; then exit $OCF_NOT_RUNNING fi exit $? Of course

[ClusterLabs] Approach to validate on stop op (Was Re: crmsh configure delete for constraints)

2016-03-29 Thread Vladislav Bogdanov
10.02.2016 12:31, Vladislav Bogdanov wrote: 10.02.2016 11:38, Ulrich Windl wrote: Vladislav Bogdanov <bub...@hoster-ok.com> schrieb am 10.02.2016 um 05:39 in Nachricht <6e479808-6362-4932-b2c6-348c7efc4...@hoster-ok.com>: [...] Well, I'd reword. Generally, RA should not exi

[ClusterLabs] Node is silently unfenced if transition is very long

2016-04-19 Thread Vladislav Bogdanov
Hi, Just found an issue with node is silently unfenced. That is quite large setup (2 cluster nodes and 8 remote ones) with a plenty of slowly starting resources (lustre filesystem). Fencing was initiated due to resource stop failure. lustre often starts very slowly due to internal recovery, and

Re: [ClusterLabs] Antw: Doing reload right

2016-07-04 Thread Vladislav Bogdanov
01.07.2016 18:26, Ken Gaillot wrote: [...] You're right, "parameters" or "params" would be more consistent with existing usage. "Instance attributes" is probably the most technically correct term. I'll vote for "reload-params" May be "reconfigure" fits better? This would at least introduce

Re: [ClusterLabs] Antw: Corosync ring marked as FAULTY

2017-02-22 Thread Vladislav Bogdanov
22.02.2017 11:40, Denis Gribkov wrote: Hi, On 22/02/17 10:35, bliu wrote: Did you specify interface with "-i " when you are using tcpdump. If you did, corosync is not talking with the multicast address, you need to check if your private network support multicast. Yes, I have used command:

Re: [ClusterLabs] ocf scripts shell and local variables

2016-08-29 Thread Vladislav Bogdanov
On August 29, 2016 11:07:39 PM GMT+03:00, Lars Ellenberg wrote: >On Mon, Aug 29, 2016 at 04:37:00PM +0200, Dejan Muhamedagic wrote: >> Hi, >> >> On Mon, Aug 29, 2016 at 02:58:11PM +0200, Gabriele Bulfon wrote: >> > I think the main issue is the usage of the "local"

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-09 Thread Vladislav Bogdanov
09.11.2016 10:59, Ulrich Windl wrote: Ken Gaillot schrieb am 08.11.2016 um 18:16 in Nachricht <92c4a0de-33ce-cdc2-a778-17fddfe63...@redhat.com>: On 11/08/2016 03:02 AM, Ulrich Windl wrote: [...] The user is responsible for choosing meaningful values. For example, if

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-14 Thread Vladislav Bogdanov
On October 14, 2016 10:13:17 AM GMT+03:00, Ulrich Windl wrote: Nikhil Utane schrieb am 13.10.2016 um >16:43 in >Nachricht >: >> Ulrich, >> >> I have 4

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-21 Thread Vladislav Bogdanov
21.10.2016 19:34, Andrei Borzenkov wrote: 14.10.2016 10:39, Vladislav Bogdanov пишет: use of utilization (balanced strategy) has one caveat: resources are not moved just because of utilization of one node is less, when nodes have the same allocation score for the resource. So, after

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-24 Thread Vladislav Bogdanov
) * capacity usage by a resource (per-resource utilization attribute) -Nikhil On Mon, Oct 24, 2016 at 4:43 PM, Vladislav Bogdanov <bub...@hoster-ok.com <mailto:bub...@hoster-ok.com>> wrote: 24.10.2016 14:04, Nikhil Utane wrote: That is what happened here :(. When 2 nod

Re: [ClusterLabs] Antw: Re: Establishing Timeouts

2016-10-11 Thread Vladislav Bogdanov
11.10.2016 09:31, Ulrich Windl wrote: Klaus Wenninger schrieb am 10.10.2016 um 20:04 in Nachricht <936e4d4b-df5c-246d-4552-5678653b3...@redhat.com>: On 10/10/2016 06:58 PM, Eric Robinson wrote: Thanks for the clarification. So what's the easiest way to ensure that the

[ClusterLabs] Issue with attrd_updater hang

2017-01-09 Thread Vladislav Bogdanov
Hi! our customers were hit by a quite strange issue with resources populating attributes in attrd. The most obscure fact is that they see that issue only on a selected subset of nodes (two nodes in a 8-node cluster). Symptoms are sporadic timeouts of resources whose RAs call attrd_updater to

Re: [ClusterLabs] Colocation of a primitive resource with a clone with limited copies

2017-04-21 Thread Vladislav Bogdanov
20.04.2017 23:16, Jan Wrona wrote: On 20.4.2017 19:33, Ken Gaillot wrote: On 04/20/2017 10:52 AM, Jan Wrona wrote: Hello, my problem is closely related to the thread [1], but I didn't find a solution there. I have a resource that is set up as a clone C restricted to two copies (using the

Re: [ClusterLabs] IPaddr2 RA and bonding

2017-08-07 Thread Vladislav Bogdanov
07.08.2017 20:39, Tomer Azran wrote: I don't want to use this approach since I don't want to be depend on pinging to other host or couple of hosts. Is there any other solution? I'm thinking of writing a simple script that will take a bond down using ifdown command when there are no slaves

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-17 Thread Vladislav Bogdanov
08.05.2017 22:20, Lentes, Bernd wrote: Hi, i remember that digimer often campaigns for a fence delay in a 2-node cluster. E.g. here: http://oss.clusterlabs.org/pipermail/pacemaker/2013-July/019228.html In my eyes it makes sense, so i try to establish that. I have two HP servers, each with an

Re: [ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected

2017-05-09 Thread Vladislav Bogdanov
09.05.2017 00:56, Ken Gaillot wrote: [...] Those messages indicate there is a real issue with the CPU load. When the cluster notices high load, it reduces the number of actions it will execute at the same time. This is generally a good idea, to avoid making the load worse. [...] message,

[ClusterLabs] pcmk_remote evaluation (continued)

2017-09-20 Thread Vladislav Bogdanov
Hi, as 1.1.17 received a lot of care in pcmk_remote, I decided to try it again in rather big setup (less then previous, so I'm not hit by IPC disconnects here). >From the first runs there are still some severe issues when cluster nodes are >fenced. The following results are obtained by

Re: [ClusterLabs] Resources are stopped and started when one node rejoins

2017-08-26 Thread Vladislav Bogdanov
26.08.2017 19:36, Octavian Ciobanu wrote: Thank you for your reply. There is no reason to set location for the resources, I think, because all the resources are set with clone options so they are started on all nodes at the same time. You still need to colocate "upper" resources with their

Re: [ClusterLabs] Resources are stopped and started when one node rejoins

2017-08-28 Thread Vladislav Bogdanov
an Ciobanu On Sat, Aug 26, 2017 at 8:17 PM, Vladislav Bogdanov <bub...@hoster-ok.com <mailto:bub...@hoster-ok.com>> wrote: 26.08.2017 19 <tel:26.08.2017%2019>:36, Octavian Ciobanu wrote: Thank you for your reply. There is no reason to set locatio

Re: [ClusterLabs] Fwd: Stopped DRBD

2017-10-18 Thread Vladislav Bogdanov
Hi, ensure you have two monitor operations configured for your drbd resource: for 'Master' and 'Slave' roles ('Slave' == 'Started' == '' for ms resources). http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_monitoring_multi_state_resources.html 18.10.2017 11:18, Антон

Re: [ClusterLabs] How much cluster-glue support is still needed in Pacemaker?

2017-11-17 Thread Vladislav Bogdanov
17.11.2017 02:26, Ken Gaillot wrote: We're starting work on Pacemaker 2.0, which will remove support for the heartbeat stack. cluster-glue was traditionally associated with heartbeat. Do current distributions still ship it? Currently, Pacemaker uses cluster-glue's stonith/stonith.h to support

Re: [ClusterLabs] Pacemaker resource start delay when there are another resource is starting

2017-11-01 Thread Vladislav Bogdanov
01.11.2017 17:20, Ken Gaillot wrote: On Sat, 2017-10-28 at 01:11 +0800, lkxjtu wrote: Thank you for your response! This means that there shoudn't be long "sleep" in ocf script. If my service takes 10 minite from service starting to healthcheck normally, then what shoud I do? That is a tough

Re: [ClusterLabs] pcmk_remote evaluation (continued)

2017-12-11 Thread Vladislav Bogdanov
11.12.2017 23:06, Ken Gaillot wrote: [...] = * The first issue I found (and I expect that to be a reason for some other issues) is that pacemaker_remote does not drop an old crmds' connection after new crmd connects. As IPC proxy connections are in the hash table, there is a 50% chance that

Re: [ClusterLabs] [questionnaire] Do you manage your pacemaker configuration by hand and (if so) what reusability features do you use?

2018-06-07 Thread Vladislav Bogdanov
Hi, On 31.05.2018 15:48, Jan Pokorný wrote: Hello, I am soliciting feedback on these CIB features related questions, please reply (preferably on-list so we have the shared collective knowledge) if at least one of the questions is answered positively in your case (just tick the respective "[ ]"

Re: [ClusterLabs] Resources not monitored in SLES11 SP4 (1.1.12-f47ea56)

2018-06-26 Thread Vladislav Bogdanov
26.06.2018 09:14, Ulrich Windl wrote: Hi! We just observed some strange effect we cannot explain in SLES 11 SP4 (pacemaker 1.1.12-f47ea56): We run about a dozen of Xen PVMs on a three-node cluster (plus some infrastructure and monitoring stuff). It worked all well so far, and there was no

Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology

2018-01-26 Thread Vladislav Bogdanov
25.01.2018 21:28, Ken Gaillot wrote: [...] If I can throw another suggestion in (without offering preference for it myself), 'dual-state clones'? The reasoning is that, though three words instead of two, spell-check likes it, it sounds OK on day one (from a language perspective) and it

Re: [ClusterLabs] Antw: Re: Antw: Re: Coming in Pacemaker 2.0.0: /var/log/pacemaker/pacemaker.log

2018-01-15 Thread Vladislav Bogdanov
15.01.2018 11:23, Ulrich Windl wrote: Vladislav Bogdanov <bub...@hoster-ok.com> schrieb am 12.01.2018 um 10:06 in Nachricht <3c5d9060-4714-cc20-3039-aa53b4a95...@hoster-ok.com>: 11.01.2018 18:39, Ken Gaillot wrote: [...] I thought one option aired at the summit to address t

Re: [ClusterLabs] Antw: Re: Coming in Pacemaker 2.0.0: /var/log/pacemaker/pacemaker.log

2018-01-12 Thread Vladislav Bogdanov
11.01.2018 18:39, Ken Gaillot wrote: [...] I thought one option aired at the summit to address this was /var/log/clusterlabs, but it's entirely possible my memory's playing tricks on me again. I don't remember that, but it sounds like a good choice. However we'd still have the same issue of

Re: [ClusterLabs] Q: HA_RSCTMP in SLES11 SP4 at first start after reboot

2018-08-13 Thread Vladislav Bogdanov
10.08.2018 19:52, Ulrich Windl wrote: Hi! A simple question: One of my RAs uses $HA_RSCTMP in SLES11 SP4, and it reports the following problem: WARNING: Unwritable HA_RSCTMP directory /var/run/resource-agents - using /tmp Just make sure you avoid using that code in 'meta-data' action

Re: [ClusterLabs] 2 node cluster dlm/clvm trouble

2018-09-11 Thread Vladislav Bogdanov
On 11.09.2018 16:31, Patrick Whitney wrote: But, when I invoke the "human" stonith power device (i.e. I turn the node off), the other node collapses... In the logs I supplied, I basically do this: 1. stonith fence (With fence scsi) At this point DLM on a healthy node is notified that node

Re: [ClusterLabs] 2 node cluster dlm/clvm trouble

2018-09-11 Thread Vladislav Bogdanov
On 11.09.2018 16:10, Valentin Vidic wrote: On Tue, Sep 11, 2018 at 09:02:06AM -0400, Patrick Whitney wrote: What I'm having trouble understanding is why dlm flattens the remaining "running" node when the already fenced node is shutdown... I'm having trouble understanding how power fencing

Re: [ClusterLabs] Any CLVM/DLM users around?

2018-10-01 Thread Vladislav Bogdanov
On October 1, 2018 4:55:07 PM UTC, Patrick Whitney wrote: >> >> Fencing in clustering is always required, but unlike pacemaker that >lets >> you turn it off and take your chances, DLM doesn't. > > >As a matter of fact, DLM has a setting "enable_fencing=0|1" for what >that's >worth. > > >> You

Re: [ClusterLabs] Antw: Any CLVM/DLM users around?

2018-10-01 Thread Vladislav Bogdanov
On October 1, 2018 8:01:36 PM UTC, Patrick Whitney wrote: [...] >so we were lucky enough our test environment is a KVM/libvirt >environment, >so I used fence_virsh. Again, I had the same problem... when the "bad" >node was fenced, dlm_controld would issue (what appears to be) a >fence_all,

Re: [ClusterLabs] Any CLVM/DLM users around?

2018-10-01 Thread Vladislav Bogdanov
gfs1, and only then for lvm. Things changed, but original design remains I believe. > >Best, >-Pat > >On Mon, Oct 1, 2018 at 1:38 PM Vladislav Bogdanov > >wrote: > >> On October 1, 2018 4:55:07 PM UTC, Patrick Whitney > >> wrote: >> >> &g

Re: [ClusterLabs] Reusing resource set in multiple constraints

2019-07-27 Thread Vladislav Bogdanov
Hi. For location you can use regexps. That is supported in crmsh as well. For order and colocation the similar feature should be implemented. Andrei Borzenkov 27 Jul 2019 11:04:43 AM wrote Is it possible to have single definition of resource set that is later references in order and

Re: [ClusterLabs] Coming in Pacemaker 2.0.5: better start-up/shutdown coordination with sbd

2020-08-23 Thread Vladislav Bogdanov
Good to know it is now not needed. You are correct about logic, yes, I just forgot details. I just recall that sbd added too much load during cluster start and recoveries. Thank you! On August 23, 2020 1:23:37 PM Klaus Wenninger wrote: On 8/21/20 8:55 PM, Vladislav Bogdanov wrote: Hi, btw

Re: [ClusterLabs] Coming in Pacemaker 2.0.5: better start-up/shutdown coordination with sbd

2020-08-21 Thread Vladislav Bogdanov
Hi, btw, is sbd is now able to handle cib diffs internally? Last time I tried to use it with frequently changing CIB, it became a CPU hog - it requested full CIB copy on every change. Fri, 21/08/2020 в 13:16 -0500, Ken Gaillot wrote: > Hi all, > > Looking ahead to the Pacemaker 2.0.5 release

Re: [ClusterLabs] VirtualDomain stop operation traced - but nothing appears in /var/lib/heartbeat/trace_ra/

2020-09-30 Thread Vladislav Bogdanov
Hi Try to enable trace_ra for start op. On September 28, 2020 10:50:19 PM "Lentes, Bernd" wrote: Hi, currently i have a VirtualDomains resource which sometimes fails to stop. To investigate further i'm tracing the stop operation of this resource. But although i stopped it already now

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-09 Thread Vladislav Bogdanov
Hi. This thread is getting too long. First, you need to ensure that your switch (or all switches in the path) have igmp snooping enabled on host ports (and probably interconnects along the path between your hosts). Second, you need an igmp querier to be enabled somewhere near (better to have it

Re: [ClusterLabs] How to set up "active-active" cluster by balancing multiple exports across servers?

2021-01-13 Thread Vladislav Bogdanov
Hi. I would run nfsserver and nfsnotify as a separate cloned group and make both other groups colocated/ordered with it. So nfs server will be just a per-host service, and then you attach exports (with LVs, filesystems, ip addresses) to it. NFS server in linux is an in-kernel creature, not an

Re: [ClusterLabs] Is reverse order for "promote" supposed to be "demote"?

2021-05-11 Thread Vladislav Bogdanov
Hi. Try order o_fs_drbd0_after_ms_drbd0 Mandatory: ms_drbd0:promote fs_drbd0:start On May 11, 2021 6:35:58 PM Andrei Borzenkov wrote: While testing drbd cluster I found errors (drbd device busy) when stopping drbd master with mounted filesystem. I do have order o_fs_drbd0_after_ms_drbd0

Re: [ClusterLabs] 32 nodes pacemaker cluster setup issue

2021-05-19 Thread Vladislav Bogdanov
Hi. Have you considered using pacemaker-remote instead? On May 18, 2021 5:55:57 PM S Sathish S wrote: Hi Team, We are setup 32 nodes pacemaker cluster setup each node has 10 resource so total [around 300+ components] are up and running. While performing installation/update with below task

Re: [ClusterLabs] Sub-clusters / super-clusters?

2021-08-03 Thread Vladislav Bogdanov
Hi You probably want to look at booth and tickets for a geo-clustering solution. On August 3, 2021 11:40:54 AM Antony Stone wrote: On Tuesday 11 May 2021 at 12:56:01, Strahil Nikolov wrote: Here is the example I had promised: pcs node attribute server1 city=LA pcs node attribute server2

Re: [ClusterLabs] Cloned ressource is restarted on all nodes if one node fails

2021-08-09 Thread Vladislav Bogdanov
Hi. I'd suggest to set your clone meta attribute 'interleaved' to 'true' Best, Vladislav On August 9, 2021 1:43:16 PM Andreas Janning wrote: Hi all, we recently experienced an outage in our pacemaker cluster and I would like to understand how we can configure the cluster to avoid this

Re: [ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Vladislav Bogdanov
Hi You may want to look at blackbox fuctionality, controlled by signals, if you won't find a way to get traces by env vars. It provides traces. Best regards On October 31, 2021 11:20:16 AM Andrei Borzenkov wrote: I think it worked in the past by passing a lot of -VVV when starting

Re: [ClusterLabs] Mutually exclusive resources ?

2023-09-27 Thread Vladislav Bogdanov
Hi, Probably utilization attributes may help with that. Try to add f.e. 'ip' utilization attrubute with value '1' to both nodes, and then add the same to VIP resources. Adam Cecile 27 сентября 2023 г. 14:21:05 написал: Hello, I'm struggling to understand if it's possible to create some

Re: [ClusterLabs] FYI: clusterlabs.org server maintenance window this weekend

2022-11-01 Thread Vladislav Bogdanov
It is not under pacemaker control??? Ken Gaillot 1 ноября 2022 г. 19:03:45 написал: Hi everybody, Just FYI, the clusterlabs.org server (including the websites and mailing lists) will be taken down for planned maintenance this weekend. Most likely it will just be a few hours on Saturday, but

Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)

2023-01-10 Thread Vladislav Bogdanov
I suspect that valudate action is run as a non-root user. Madison Kelly 11 января 2023 г. 07:06:55 написал: On 2023-01-11 00:21, Madison Kelly wrote: On 2023-01-11 00:14, Madison Kelly wrote: Hi all, Edit: Last message was in HTML format, sorry about that. I've got a hell of a weird

Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)

2023-01-11 Thread Vladislav Bogdanov
And, one more thing can affect that - selinux. I doubt, but that's worth checking. Vladislav Bogdanov 11 января 2023 г. 22:21:03 написал: Then I would suggest to log all env vars and compare them, probably something is missing in validate for virsh to be happy. Madison Kelly 11 января 2023

Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)

2023-01-11 Thread Vladislav Bogdanov
Then I would suggest to log all env vars and compare them, probably something is missing in validate for virsh to be happy. Madison Kelly 11 января 2023 г. 22:06:45 написал: On 2023-01-11 01:13, Vladislav Bogdanov wrote: I suspect that valudate action is run as a non-root user. I modified

Re: [ClusterLabs] Failed 'virsh' call when test RA run by crm_resource (con't)

2023-01-11 Thread Vladislav Bogdanov
What would be the reason of running that command without redirecting its output somewhere? Madison Kelly 12 января 2023 г. 07:21:44 написал: On 2023-01-12 01:12, Reid Wahl wrote: On Wed, Jan 11, 2023 at 8:11 PM Madison Kelly wrote: Hi all, There was a lot of sub-threads, so I

Re: [ClusterLabs] Antw: [EXT] DRBD Dual Primary Write Speed Extremely Slow

2022-11-14 Thread Vladislav Bogdanov
Hi On Mon, 2022-11-14 at 15:00 +0100, Tyler Phillippe via Users wrote: > Good idea! I setup a RAM disk on both of those systems, let them > sync, added it to the cluster. > > One thing I left out (which didn't hit me until yesterday as a > possibility) is that I have the iSCSI LUN attached to

Re: [ClusterLabs] resource cloned group colocations

2023-03-02 Thread Vladislav Bogdanov
On Thu, 2023-03-02 at 08:41 +0100, Gerald Vogt wrote: > Hi, > > I am setting up a mail relay cluster which main purpose is to > maintain > the service ips via IPaddr2 and move them between cluster nodes when > necessary. > > The service ips should only be active on nodes which are running all

Re: [ClusterLabs] Antw: [EXT] resource cloned group colocations

2023-03-02 Thread Vladislav Bogdanov
On Thu, 2023-03-02 at 14:30 +0100, Ulrich Windl wrote: > > > > Gerald Vogt schrieb am 02.03.2023 um 08:41 > > > > in Nachricht > <624d0b70-5983-4d21-6777-55be91688...@spamcop.net>: > > Hi, > > > > I am setting up a mail relay cluster which main purpose is to > > maintain > > the service ips via

[ClusterLabs] Offtopic - role migration

2023-04-18 Thread Vladislav Bogdanov
Btw, an interesting question. How much efforts would it take to support a migration of a Master role over the nodes? An use-case is drbd, configured for a multi-master mode internally, but with master-max=1 in the resource definition. Assuming that resource-agent supports that flow - 1. Do

Re: [ClusterLabs] Lost access with the volume while ZFS and other resources migrate to other node (reset VM)

2023-04-04 Thread Vladislav Bogdanov
I know that uscsi initiators are very sensible to connection drops. That's why in all my setups with iscsi I use a special m/s resource agent which in a slave mode drops all packets to/from portals. That prevents initiators from receiving FIN packets from the target when it migrates, and they

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-12 Thread Vladislav Bogdanov
On Wed, 2023-04-12 at 14:04 +0300, Andrei Borzenkov wrote: > On Wed, Apr 12, 2023 at 1:21 PM Vladislav Bogdanov ok.com> wrote: > > > > Hi, > > > > Just add a Master role for drbd resource in the colocation. Default > > is Started (or Slave). > >

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-12 Thread Vladislav Bogdanov
Hi, Just add a Master role for drbd resource in the colocation. Default is Started (or Slave). Philip Schiller 12 апреля 2023 г. 11:28:57 написал: Hi All, I am using a simple two-nodes cluster with Zvol -> DRBD -> Virsh in primary/primary mode (necessary for live migration). My

Re: [ClusterLabs] Lost access with the volume while ZFS and other resources migrate to other node (reset VM)

2023-04-05 Thread Vladislav Bogdanov
Ah, and yes, it is for iptables, not for nft or firewalld. Could be easily fixed though. And RA expects target chains to be pre-created. Vladislav Bogdanov 5 апреля 2023 г. 14:53:35 написал: Please find attached. I use it the following way: primitive vip-10-5-4-235 ocf:my-org:IPaddr2

Re: [ClusterLabs] Lost access with the volume while ZFS and other resources migrate to other node (reset VM)

2023-04-05 Thread Vladislav Bogdanov
uot; \ target-role="Master" colocation c01-pool-0-iscsi-vips-fw-with-vips inf: \ ms-c01-pool-0-iscsi-vips-fw:Master \ c01-pool-0-iscsi-vips:Started order c01-pool-0-iscsi-vips-fw-after-target inf: iscsi-export:start \ ms-c01-pool-0-iscsi-vips-fw:promote order c01-po

Re: [ClusterLabs] location constraint does not move promoted resource ?

2023-07-03 Thread Vladislav Bogdanov
I think 1 is a common number across promotable resource agents writers to pass to crm_master when agent during probe/monitor call thinks that node is really ready to have a resource promoted. Drbd is one of examples. Best, Vlad lejeczek via Users 03.06.2023. 19:32:58 wrote: On

Re: [ClusterLabs] ubsubscribe

2024-02-12 Thread Vladislav Bogdanov
s/ub/un/ On February 12, 2024 20:17:45 Bob Marčan via Users wrote: On Mon, 12 Feb 2024 16:48:19 +0100 "Antony Stone" wrote: On Monday 12 February 2024 at 16:42:06, Bob Marčan via Users wrote: > It should be in the body, not in the subject. According to the headers, it should be in the

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Vladislav Bogdanov
What if node (especially vm) freezes for several minutes and then continues to write to a shared disk where other nodes already put their data? In my opinion, fencing, preferably two-level, is mandatory for lustre, trust me, I'd developed whole HA stack for both Exascaler and PangeaFS. We've

Re: [ClusterLabs] how to disable pacemaker throttle mode

2024-02-05 Thread Vladislav Bogdanov
IIRC, there is one issue with that, is that IO load is considered a CPU load, so on busy storage servers you get throttling with almost free CPU. I may be wrong that load is calculated from loadavg, which is a different story at all, as it indicates the number of processes which are ready to