Re: [ClusterLabs] 2-Node cluster - both nodes unclean - can't start cluster

2023-03-13 Thread Lentes, Bernd
> -Original Message- > From: Reid Wahl > Sent: Friday, March 10, 2023 10:30 PM > To: Cluster Labs - All topics related to open-source clustering welcomed > ; Lentes, Bernd muenchen.de> > Subject: Re: [ClusterLabs] 2-Node cluster - both nodes unclean - can't > st

[ClusterLabs] 2-Node cluster - both nodes unclean - can't start cluster

2023-03-10 Thread Lentes, Bernd
Hi, I don’t get my cluster running. I had problems with an OCFS2 Volume, both nodes have been fenced. When I do now a “systemctl start pacemaker.service”, crm_mon shows for a few seconds both nodes as UNCLEAN, then pacemaker stops. I try to confirm the fendcing with “Stonith_admin –C”, but it

[ClusterLabs] VirtualDomain did not stop although "crm resource stop"

2022-11-02 Thread Lentes, Bernd
Hi, i think i found the reason, but i want to be sure. I wanted to stop a VirtualDomain and did a "crm resource stop ..." But it didn't shut down. After waiting several minutes i stopped with libvirt, circumventing the cluster software. First i wondered "why didn't it shutdown ?", but then i

Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Lentes, Bernd
- On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com wrote: > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [ > mailto:users@clusterlabs.org | users@clusterlabs.org ] > wrote: > Did you try a cleanup in between? When i do a cleanup before trace/untrace the resource is

Re: [ClusterLabs] crm resource trace

2022-10-21 Thread Lentes, Bernd
- On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com wrote: > This turned out to be interesting. > > In the first case, the resource history contains a start action and a > recurring monitor. The parameters to both change, so the resource > requires a restart. > > In the second

Re: [ClusterLabs] crm resource trace

2022-10-18 Thread Lentes, Bernd
- On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com wrote: > This turned out to be interesting. > > In the first case, the resource history contains a start action and a > recurring monitor. The parameters to both change, so the resource > requires a restart. > > In the second

Re: [ClusterLabs] Antw: [EXT] trace of resource ‑ sometimes restart, sometimes not

2022-10-18 Thread Lentes, Bernd
- On 18 Oct, 2022, at 14:35, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > > # crm configure > edit ... > verify > ptest nograph #!!! > commit That's very helpful. I didn't know that, Thanks. > -- > If you used that, you would have seen the restart. > Despite of that I wonder

Re: [ClusterLabs] crm resource trace

2022-10-17 Thread Lentes, Bernd
Hi, i try to find out why there is sometimes a restart of the resource and sometimes not. Unpredictable behaviour is someting i expect from Windows, not from Linux. Here you see two "crm resource trace "resource"". In the first case the resource is restarted , in the second not. The command i

Re: [ClusterLabs] crm resource trace (Was: Re: trace of resource - sometimes restart, sometimes not)

2022-10-10 Thread Lentes, Bernd
- On 7 Oct, 2022, at 21:37, Reid Wahl nw...@redhat.com wrote: > On Fri, Oct 7, 2022 at 6:02 AM Lentes, Bernd > wrote: >> - On 7 Oct, 2022, at 01:18, Reid Wahl nw...@redhat.com wrote: >> >> > How did you set a trace just for monitor? >> >> crm resour

Re: [ClusterLabs] trace of resource - sometimes restart, sometimes not

2022-10-10 Thread Lentes, Bernd
- On 7 Oct, 2022, at 01:08, Ken Gaillot kgail...@redhat.com wrote: > > Yes, trace_ra is an agent-defined resource parameter, not a Pacemaker- > defined meta-attribute. Resources are restarted anytime a parameter > changes (unless the parameter is set up for reloads). > > trace_ra is unusual

[ClusterLabs] expected_votes in cluster conf

2022-10-09 Thread Lentes, Bernd
Dear all, while checking my cluster with "crm status xml" i stumbled across: ha-idg-1:/usr/lib/ocf/resource.d/heartbeat # crm status xml <= expected_votes="unknown" ? I didn't

Re: [ClusterLabs] trace of resource - sometimes restart, sometimes not

2022-10-07 Thread Lentes, Bernd
- On 7 Oct, 2022, at 01:08, Ken Gaillot kgail...@redhat.com wrote: > > Yes, trace_ra is an agent-defined resource parameter, not a Pacemaker- > defined meta-attribute. Resources are restarted anytime a parameter > changes (unless the parameter is set up for reloads). > > trace_ra is

Re: [ClusterLabs] trace of resource - sometimes restart, sometimes not

2022-10-07 Thread Lentes, Bernd
- On 7 Oct, 2022, at 01:18, Reid Wahl nw...@redhat.com wrote: > How did you set a trace just for monitor? crm resource trace dlm monitor. > Wish I could help with that -- it's mostly a mystery to me too ;) :-)) smime.p7s Description: S/MIME Cryptographic Signature

[ClusterLabs] trace of resource - sometimes restart, sometimes not

2022-10-06 Thread Lentes, Bernd
Hi, i have some problems with our DLM, so i wanted to trace it. Yesterday i just set a trace for "monitor". No restart of DLM afterwards. It went fine as expected. I got logs in /var/lib/heartbeat/trace_ra. After some monitor i stopped tracing. Today i set a trace for all operations. Now

Re: [ClusterLabs] Cluster does not start resources

2022-08-24 Thread Lentes, Bernd
- On 24 Aug, 2022, at 16:26, kwenning kwenn...@redhat.com wrote: >> >> if I get Ulrich right - and my fading memory of when I really used crmsh the >> last time is telling me the same thing ... >> I get the impression many people prefer pcs to crm. Is there any reason for that ? And can i

Re: [ClusterLabs] Cluster does not start resources

2022-08-24 Thread Lentes, Bernd
- On 24 Aug, 2022, at 16:26, kwenning kwenn...@redhat.com wrote: > > Guess the resources running now are those you tried to enable before > while they were globally stopped > No. First i set stop-all-resources to false. Then SOME resources started. Then i tried several times to

Re: [ClusterLabs] Antw: [EXT] Re: Cluster does not start resources

2022-08-24 Thread Lentes, Bernd
- On 24 Aug, 2022, at 16:01, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: >> Now with "crm resource start" all resources started. I didn't change >> anything !?! > > I guess that command set the roles of all resources to "started", so you > changed > something ;-) I did it

Re: [ClusterLabs] Cluster does not start resources

2022-08-24 Thread Lentes, Bernd
Hi, Now with "crm resource start" all resources started. I didn't change anything !?! Bernd smime.p7s Description: S/MIME Cryptographic Signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home:

Re: [ClusterLabs] Cluster does not start resources

2022-08-24 Thread Lentes, Bernd
- On 24 Aug, 2022, at 07:21, Reid Wahl nw...@redhat.com wrote: > As a result, your command might start the virtual machines, but > Pacemaker will still show that the resources are "Stopped (disabled)". > To fix that, you'll need to enable the resources. How do i achieve that ? Bernd

Re: [ClusterLabs] Antw: [EXT] Re: Cluster does not start resources

2022-08-24 Thread Lentes, Bernd
- On 24 Aug, 2022, at 08:17, Reid Wahl nw...@redhat.com wrote: > I'm not sure off the top of my head what (if anything) gets sent to > the logs. Do note that Bernd is using pacemaker v1, which hasn't been > receiving new features for quite a while. An Update is recommended ? Bernd

Re: [ClusterLabs] Antw: [EXT] Re: Cluster does not start resources

2022-08-24 Thread Lentes, Bernd
- On 24 Aug, 2022, at 08:10, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > > Bernd, > > that command would simply set the role to "started", but I guess it already > is. > Obviously to be effective stop-all-resources must habve precedence. You see? > > Regards, > Ulrich

Re: [ClusterLabs] Cluster does not start resources

2022-08-24 Thread Lentes, Bernd
> > There is no resource with the name "virtual_domain" in your list. All > non-active resources in your list are either disabled or unmanaged. > Without actual commands that list resource state before "crm resource > start", "crm resource start" itself and once more resource state after > this

Re: [ClusterLabs] Cluster does not start resources

2022-08-24 Thread Lentes, Bernd
- On 24 Aug, 2022, at 07:22, Reid Wahl nw...@redhat.com wrote: > Are the VMs running after your start command? No. Bernd smime.p7s Description: S/MIME Cryptographic Signature ___ Manage your subscription:

Re: [ClusterLabs] Cluster does not start resources

2022-08-23 Thread Lentes, Bernd
- On 24 Aug, 2022, at 07:03, arvidjaar arvidj...@gmail.com wrote: > On 24.08.2022 07:34, Lentes, Bernd wrote: >> >> >> - On 24 Aug, 2022, at 05:33, Reid Wahl nw...@redhat.com wrote: >> >> >>> The stop-all-resources cluster property is set to

Re: [ClusterLabs] Cluster does not start resources

2022-08-23 Thread Lentes, Bernd
- On 24 Aug, 2022, at 05:33, Reid Wahl nw...@redhat.com wrote: > The stop-all-resources cluster property is set to true. Is that intentional? OMG. Thanks Reid ! But unfortunately not all virtual domains are running: Stack: corosync Current DC: ha-idg-2 (version

Re: [ClusterLabs] Cluster does not start resources

2022-08-23 Thread Lentes, Bernd
- On 24 Aug, 2022, at 04:04, Reid Wahl nw...@redhat.com wrote: > Can you share your CIB? Not sure off hand what everything means (resource not > found, IPC error, crmd failure and respawn), and pacemaker v1 logs aren't the > easiest to interpret. But perhaps something in the CIB will show

[ClusterLabs] Cluster does not start resources

2022-08-23 Thread Lentes, Bernd
Hi, currently i can't start resources on our 2-node-cluster. Cluster seems to be ok: Stack: corosync Current DC: ha-idg-1 (version 1.1.24+20210811.f5abda0ee-3.21.9-1.1.24+20210811.f5abda0ee) - partition with quorum Last updated: Wed Aug 24 02:56:46 2022 Last change: Wed Aug 24 02:56:41 2022 by

Re: [ClusterLabs] 2-Node Cluster - fencing with just one node running ?

2022-08-04 Thread Lentes, Bernd
- On 4 Aug, 2022, at 19:46, Reid Wahl nw...@redhat.com wrote: >> It shuts down ha-idg-2: >> 2022-08-03T01:19:51.866200+02:00 ha-idg-2 systemd-logind[1535]: Power key >> pressed. >> 2022-08-03T01:19:52.048335+02:00 ha-idg-2 systemd-logind[1535]: System is >> powering down. >>

Re: [ClusterLabs] 2-Node Cluster - fencing with just one node running ?

2022-08-04 Thread Lentes, Bernd
- On 4 Aug, 2022, at 15:14, arvidjaar arvidj...@gmail.com wrote: > On 04.08.2022 16:06, Lentes, Bernd wrote: >> >> - On 4 Aug, 2022, at 00:27, Reid Wahl nw...@redhat.com wrote: >> >> Would do you mean by "banned" ? "crm resource ban ..." ?

Re: [ClusterLabs] 2-Node Cluster - fencing with just one node running ?

2022-08-04 Thread Lentes, Bernd
- On 4 Aug, 2022, at 00:27, Reid Wahl nw...@redhat.com wrote: > > Such constraints are unnecessary. > > Let's say we have two stonith devices called "fence_dev1" and > "fence_dev2" that fence nodes 1 and 2, respectively. If node 2 needs > to be fenced, and fence_dev2 is running on node 2,

[ClusterLabs] 2-Node Cluster - fencing with just one node running ?

2022-08-03 Thread Lentes, Bernd
Hi, i have the following situation: 2-node Cluster, just one node running (ha-idg-1). The second node (ha-idg-2) is in standby. DLM monitor on ha-idg-1 times out. Cluster tries to restart all services depending on DLM: Aug 03 01:07:11 [19367] ha-idg-1pengine: notice: LogAction:*

[ClusterLabs] cluster log not unambiguous about state of VirtualDomains

2022-08-03 Thread Lentes, Bernd
Hi, i have a strange behaviour found in the cluster log (/var/log/cluster/corosync.log). I KNOW that i put one node (ha-idg-2) in standby mode and stopped the pacemaker service on that node: The history of the shell says: 993 2022-08-02 18:28:25 crm node standby ha-idg-2 994 2022-08-02

Re: [ClusterLabs] [EXT] Problem with DLM

2022-07-26 Thread Lentes, Bernd
- On 26 Jul, 2022, at 20:06, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > Hi Bernd! > > I think the answer may be some time before the timeout was reported; maybe a > network issue? Or a very high load. It's hard to say from the logs... Yes, i had a high load before: Jul 20

[ClusterLabs] Problem with DLM

2022-07-26 Thread Lentes, Bernd
Hi, it seems my DLM went grazy: /var/log/cluster/corosync.log: Jul 20 00:21:56 [12204] ha-idg-1 lrmd: warning: child_timeout_callback: dlm_monitor_3 process (PID 11816) timed out Jul 20 00:21:56 [12204] ha-idg-1 lrmd: warning: operation_finished: dlm_monitor_3:11816

[ClusterLabs] is there a way to cancel a running live migration or a "resource stop" ?

2022-07-07 Thread Lentes, Bernd
Hi, is there a way to cancel a running live migration or a "resource stop" ? Bernd -- Bernd Lentes System Administrator Institute for Metabolism and Cell Death (MCD) Building 25 - office 122 HelmholtzZentrum München bernd.len...@helmholtz-muenchen.de phone: +49 89 3187 1241 +49 89

Re: [ClusterLabs] modified RA can't be used

2022-06-27 Thread Lentes, Bernd
- On Jun 27, 2022, at 3:54 PM, kgaillot kgail...@redhat.com wrote: > As an aside, the preferred naming for custom agents is to change the > provider (ocf:PROVIDER:AGENT), putting them in > /usr/lib/ocf/resource.d/PROVIDER/AGENT. > > For example, ocf:local:VirtualDomain or

Re: [ClusterLabs] modified RA can't be used

2022-06-27 Thread Lentes, Bernd
- On Jun 27, 2022, at 2:57 PM, Oyvind Albrigtsen oalbr...@redhat.com wrote: > You need to update the agent name in the metadata section to be the > same as the filename. > > > Oyvind > OMG. Thank you !!! Bernd smime.p7s Description: S/MIME Cryptographic Signature

[ClusterLabs] modified RA can't be used

2022-06-27 Thread Lentes, Bernd
Hi, i adapted the RA ocf/heartbeat/VirtualDomain to my needs and renamed it to VirtualDomain.ssh When i try to use it now, i get an error message. I start e.g. "crm configure edit vm-idcc-devel" to modify an existing VirtualDomain that it uses the new RA and want to save it i get the following

[ClusterLabs] how does the VirtualDomain RA know with which options it's called ?

2022-05-12 Thread Lentes, Bernd
Hi, from my understanding the resource agents in /usr/lib/ocf/resource.d/heartbeat are quite similar to the old scripts in /etc/init.d started by init. Init starts these scripts with "script [start|stop|reload|restart|status]". Inside the script there is a case construct which checks the options

Re: [ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

2022-02-18 Thread Lentes, Bernd
- On Feb 17, 2022, at 4:25 PM, kgaillot kgail...@redhat.com wrote: >> So for me the big question is: >> When a transition is happening, and there is a change in the cluster, >> is the transition "aborted" >> (delayed or interrupted would be better) or not ? >> Is this behaviour consistent ?

Re: [ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

2022-02-17 Thread Lentes, Bernd
- On Feb 16, 2022, at 6:48 PM, arvidjaar arvidj...@gmail.com wrote: > > > Splitting logs between different messages does not really help in interpreting > them. I agree. Here is the complete excerpt from the respective time: https://nc-mcd.helmholtz-muenchen.de/nextcloud/s/eY8SA8pe4HZBBc8

Re: [ClusterLabs] Antw: Antw: Re: Antw: [EXT] Re: crm resource stop VirtualDomain ‑ but VirtualDomain shutdown start some minutes later

2022-02-17 Thread Lentes, Bernd
- On Feb 17, 2022, at 10:26 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: "Ulrich Windl" schrieb am 17.02.2022 > > To correct myself: crm was a "-w" (wait) option that will wait until the DC is > idle. In most cases it just waits until the operation requeszted has

Re: [ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

2022-02-16 Thread Lentes, Bernd
- On Feb 16, 2022, at 12:52 AM, kgaillot kgail...@redhat.com wrote: > A transition is the set of actions that need to be taken in response to > current conditions. A transition is aborted any time conditions change > (here, the target-role being changed in the configuration), so that a >

Re: [ClusterLabs] Antw: [EXT] Re: crm resource stop VirtualDomain ‑ but VirtualDomain shutdown start some minutes later

2022-02-16 Thread Lentes, Bernd
- On Feb 16, 2022, at 1:01 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > Bernd, > > I guess the syslog(/journal of the DC has better logs. Unfortunately the journal didn't reveal something. > As I see it now, it seems stop of vm_pathway takes a few minutes, and no other >

Re: [ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

2022-02-16 Thread Lentes, Bernd
- On Feb 16, 2022, at 12:52 AM, kgaillot kgail...@redhat.com wrote: >> Any idea ? >> What is about that transition 128, which is aborted ? > > A transition is the set of actions that need to be taken in response to > current conditions. A transition is aborted any time conditions change >

[ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

2022-02-15 Thread Lentes, Bernd
Hi, i have a weird behaviour in my 2-node-cluster. I stopped several VirtualDomains via "crm resource stop VirtualDomain", but the respective shutdown starts minutes later. All on the same host. .bash_history: 3520 2022-02-15 20:55:44 crm resource stop vm_greensql 3521 2022-02-15 20:56:34

Re: [ClusterLabs] what is the "best" way to completely shutdown a two-node cluster ?

2022-02-10 Thread Lentes, Bernd
- On Feb 10, 2022, at 4:40 PM, Jehan-Guillaume de Rorthais j...@dalibo.com wrote: > > I wonder if after the cluster shutdown complete, the target-role=Stopped could > be removed/edited offline with eg. crmadmin? That would make VirtualDomain > startable on boot. > > I suppose this would

Re: [ClusterLabs] what is the "best" way to completely shutdown a two-node cluster ?

2022-02-09 Thread Lentes, Bernd
- On Feb 9, 2022, at 11:26 AM, Jehan-Guillaume de Rorthais j...@dalibo.com wrote: > > I'm not sure how "crm resource stop " actually stop a resource. I thought > it would set "target-role=Stopped", but I might be wrong. > > If "crm resource stop" actually use "target-role=Stopped", I

Re: [ClusterLabs] what is the "best" way to completely shutdown a two-node cluster ?

2022-02-09 Thread Lentes, Bernd
- On Feb 7, 2022, at 4:13 PM, Jehan-Guillaume de Rorthais j...@dalibo.com wrote: > On Mon, 7 Feb 2022 14:24:44 +0100 (CET) > "Lentes, Bernd" wrote: > >> Hi, >> >> i'm currently changing a bit in my cluster because i realized that my >> configu

Re: [ClusterLabs] Antw: [EXT] what is the "best" way to completely shutdown a two‑node cluster ?

2022-02-07 Thread Lentes, Bernd
- On Feb 7, 2022, at 2:36 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > > Bernd, > > what if you set the node affected to standby, or shut down the cluster > services? Or all all nodes powered by the same UPS? All nodes are powered by the same UPS. > > >> >> And what

[ClusterLabs] what is the "best" way to completely shutdown a two-node cluster ?

2022-02-07 Thread Lentes, Bernd
Hi, i'm currently changing a bit in my cluster because i realized the my configuration for a power outtage didn't work as i expected. My idea is currently: - first stop about 20 VirtualDomains, which are my services. This will surely takes some minutes. I'm thinking of stopping each with a time

[ClusterLabs] Is there a python package for pacemaker ?

2022-02-02 Thread Lentes, Bernd
Hi, i need to write some scripts for our cluster. Until now i wrote bash scripts. But i like to learn python. Is there a package for pacemaker ? What i found is: https://pypi.org/project/pacemaker/ and i'm not sure what that is. Thanks. Bernd -- Bernd Lentes System Administrator Institute

[ClusterLabs] HA-Cluster, UPS and power outage - how is your setup ?

2022-02-01 Thread Lentes, Bernd
Hi, we just experienced two power outages in a few days. This showed me that our UPS configuration and the handling of resources on the cluster is insufficient. We have a two-node cluster with SLES 12 SP5 and a Smart-UPS SRT 3000 from APC with Network Management Card. The UPS is able to buffer

Re: [ClusterLabs] Problem with high load (IO)

2021-09-30 Thread Lentes, Bernd
- On Sep 30, 2021, at 3:55 AM, Gang He g...@suse.com wrote: >> >> 1) No problems during this step, the procedure just needs a few seconds. >> reflink is a binary. See reflink --help >> Yes, it is a cluster filesystem. I do the procedure just on one node, >> so i don't have duplicates. >>

Re: [ClusterLabs] Problem with high load (IO)

2021-09-29 Thread Lentes, Bernd
- On Sep 29, 2021, at 4:37 AM, Gang He g...@suse.com wrote: > Hi Lentes, > > Thank for your feedback. > I have some questions as below, > 1) how to clone these VM images from each ocfs2 nodes via reflink? > do you encounter any problems during this step? > I want to say, this is a shared

Re: [ClusterLabs] Problem with high load (IO)

2021-09-28 Thread Lentes, Bernd
- On Sep 27, 2021, at 2:51 PM, Pacemaker ML users@clusterlabs.org wrote: > I would use something liek this: > > ionice -c 2 -n 7 nice cp XXX YYY > > Best Regards, > Strahil Nikolov Just for a better understanding: ionice does not relate to the copy procedure in this commandline, but to

Re: [ClusterLabs] Problem with high load (IO)

2021-09-27 Thread Lentes, Bernd
- On Sep 27, 2021, at 2:51 PM, Pacemaker ML users@clusterlabs.org wrote: > I would use something liek this: > > ionice -c 2 -n 7 nice cp XXX YYY > > Best Regards, > Strahil Nikolov > Hi Strahil, that sounds interesting, i didn't know ionice. I will have a look on the man-pages.

[ClusterLabs] Problem with high load (IO)

2021-09-27 Thread Lentes, Bernd
Hi, i have a two-node cluster running on SLES 12SP5 with two HP servers and a common FC SAN. Most of my resources are virtual domains offering databases and web pages. The disks from the domains reside on a OCFS2 Volume on a FC SAN. Each night a 9pm all domains are snapshotted with the OCFS2

Re: [ClusterLabs] cofigured trace for Virtual Domains - automatic restart ?

2021-09-23 Thread Lentes, Bernd
- On Sep 20, 2021, at 10:14 PM, kgaillot kgail...@redhat.com wrote: > > As far as I know, only a few of the ocf:pacemaker agents support OCF > 1.1 currently. The resource-agents package doesn't. > > To check a given agent, run "crm_resource --show-metadata > ocf:$PROVIDER:$AGENT | grep ''"

Re: [ClusterLabs] cofigured trace for Virtual Domains - automatic restart ?

2021-09-18 Thread Lentes, Bernd
- On Sep 18, 2021, at 1:19 AM, kgaillot kgail...@redhat.com wrote: > > If the agent meta-data advertises support for the 1.1 standard and > indicates that the trace_ra parameter is reloadable, then Pacemaker > will automatically do a reload instead of restart for the resource if > the

Re: [ClusterLabs] cofigured trace for Virtual Domains - automatic restart ?

2021-09-17 Thread Lentes, Bernd
- On Sep 17, 2021, at 9:13 PM, kgaillot kgail...@redhat.com wrote: >> Bernd > > Tracing works by setting a special parameter, which to pacemaker looks > like a configuration change that requires a restart. With the new OCF > 1.1 standard, the trace parameter could be marked reloadable, but

[ClusterLabs] cofigured trace for Virtual Domains - automatic restart ?

2021-09-17 Thread Lentes, Bernd
Hi, today i configured tracing for some VirtualDomains: ha-idg-2:~ # crm resource trace vm_documents-oo migrate_from INFO: Trace for vm_documents-oo:migrate_from is written to /var/lib/heartbeat/trace_ra/ INFO: Trace set, restart vm_documents-oo to trace the migrate_from operation ha-idg-2:~ #

[ClusterLabs] virtual domains not migrated

2021-09-14 Thread Lentes, Bernd
Hi, Today i couldn't migrate several virtual domains. I have a Two-node cluster with SuE SLES 12 SP5. Pacemaker is pacemaker-1.1.23+20200622.28dd98fad-3.9.2.20591.0.PTF.1177212.x86_64, corosync is corosync-2.3.6-9.13.1.x86_64. Migration just stopped after an amount of time. This is what i

Re: [ClusterLabs] Live migration possible with KSM ?

2021-03-31 Thread Lentes, Bernd
enkov >> wrote: >> On 30.03.2021 18:16, Lentes, Bernd wrote: >> > Hi, >>> currently i'm reading "Mastering KVM Virtualization", published by Packt >> > Publishing, a book i can really recommend. >>> There are some proposals for tuning guests. One

[ClusterLabs] Live migration possible with KSM ?

2021-03-30 Thread Lentes, Bernd
Hi, currently i'm reading "Mastering KVM Virtualization", published by Packt Publishing, a book i can really recommend. There are some proposals for tuning guests. One is KSM (kernel samepage merging), which sounds quite interesting. Especially in a system with lots of virtual machines with the

Re: [ClusterLabs] alert is not executed - solved

2021-02-16 Thread Lentes, Bernd
- On Feb 15, 2021, at 10:24 PM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: > - On Feb 15, 2021, at 9:00 PM, kgaillot kgail...@redhat.com wrote: > >> On Mon, 2021-02-15 at 20:47 +0100, Lentes, Bernd wrote: >>> - On Feb 15, 2021, at 4:53 PM, kgaillo

Re: [ClusterLabs] alert is not executed

2021-02-15 Thread Lentes, Bernd
- On Feb 15, 2021, at 9:00 PM, kgaillot kgail...@redhat.com wrote: > On Mon, 2021-02-15 at 20:47 +0100, Lentes, Bernd wrote: >> - On Feb 15, 2021, at 4:53 PM, kgaillot kgail...@redhat.com >> wrote: >> >> > I'd check for SELinux denials. >&g

[ClusterLabs] alert is not executed

2021-02-15 Thread Lentes, Bernd
- On Feb 15, 2021, at 4:53 PM, kgaillot kgail...@redhat.com wrote: > I'd check for SELinux denials. > SELinux isn't installed and the AppArmor service does not start. I changed the subject. Bernd Helmholtz Zentrum München Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer

Re: [ClusterLabs] Antw: [EXT] Re: weird xml snippet in "crm configure show"

2021-02-15 Thread Lentes, Bernd
- On Feb 15, 2021, at 9:55 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: >> Hi, >> >> i could configure the following: >> >> ha-idg-1:~ # crm configure show smtp_alert >> alert smtp_alert "/root/skripte/alert_smtp.sh" \ >> attributes

Re: [ClusterLabs] weird xml snippet in "crm configure show"

2021-02-12 Thread Lentes, Bernd
- On Feb 12, 2021, at 12:50 PM, Yan Gao y...@suse.com wrote: > > > It seems that crmsh has difficulty parsing the "random" ids of the > attribute sets here. I guess `crm configure edit` the part to be > something like: > > alert smtp_alert "/root/skripte/alert_smtp.sh" \ >

Re: [ClusterLabs] Antw: [EXT] weird xml snippet in "crm configure show"

2021-02-12 Thread Lentes, Bernd
- On Feb 12, 2021, at 5:00 PM, hunter86 bg hunter86...@yahoo.com wrote: > WARNING: cib-bootstrap-options: unknown attribute 'no-quirum-policy' > That looks like a typo. > Best Regards, > Strahil Nikolov Thanks, i found that already. Bernd Helmholtz Zentrum München Helmholtz Zentrum

Re: [ClusterLabs] Antw: [EXT] weird xml snippet in "crm configure show"

2021-02-12 Thread Lentes, Bernd
- On Feb 12, 2021, at 11:18 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > > What is the output of "crm configure verify"? ha-idg-1:~ # crm configure verify WARNING: cib-bootstrap-options: unknown attribute 'no-quirum-policy' WARNING: clvmd: specified timeout 20 for monitor

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-26 Thread Lentes, Bernd
- On Oct 26, 2020, at 4:09 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > > AFAIK you can even kill processes in Linux that are in "D" state (contrary to > other operating systems). How ? Bernd Helmholtz Zentrum München Helmholtz Zentrum Muenchen Deutsches

Re: [ClusterLabs] VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-26 Thread Lentes, Bernd
- On Oct 23, 2020, at 11:18 PM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: > - On Oct 23, 2020, at 11:11 PM, arvidjaar arvidj...@gmail.com wrote: > > >>> I need someting like that which waits for some time (maybe 30s) if the >>> domain >>> nevertheless stops although >>>

Re: [ClusterLabs] Antw: [EXT] Re: VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-26 Thread Lentes, Bernd
- On Oct 26, 2020, at 8:41 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > "SIGKILL: Device or resource busy" is nonsense: kill does not wait; it either > fails or succeeds. yes and no. When you send a SIGKILL to a process which is in 'D' state, the signal can't be delivered,

Re: [ClusterLabs] VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-23 Thread Lentes, Bernd
- On Oct 23, 2020, at 11:11 PM, arvidjaar arvidj...@gmail.com wrote: >> I need someting like that which waits for some time (maybe 30s) if the domain >> nevertheless stops although >> "virsh destroy" gaves an error back. Because the SIGKILL is delivered if the >> process wakes up from D

Re: [ClusterLabs] VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-23 Thread Lentes, Bernd
- On Oct 23, 2020, at 8:45 PM, Valentin Vidić vvi...@valentin-vidic.from.hr wrote: > On Fri, Oct 23, 2020 at 08:08:31PM +0200, Lentes, Bernd wrote: >> But when the timeout has run out the RA tries to kill the machine with a >> "virsh >> destroy". >

Re: [ClusterLabs] VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-23 Thread Lentes, Bernd
- On Oct 23, 2020, at 5:06 PM, Strahil Nikolov hunter86...@yahoo.com wrote: > why don't you work with something like this: 'op stop interval =300 > timeout=600'. > The stop operation will timeout at your requirements without modifying the > script. > > Best Regards, > Strahil Nikolov But

Re: [ClusterLabs] VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-23 Thread Lentes, Bernd
- On Oct 23, 2020, at 7:11 AM, Andrei Borzenkov arvidj...@gmail.com wrote: >> >> ocf_log info "Issuing forced shutdown (destroy) request for domain >> ${DOMAIN_NAME}." >> out=$(LANG=C virsh $VIRSH_OPTIONS destroy ${DOMAIN_NAME} 2>&1) >> ex=$? >> sleep (10)

[ClusterLabs] VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-22 Thread Lentes, Bernd
Hi guys, ocassionally stopping a VirtualDomain resource via "crm resource stop" does not work, and in the end the node is fenced, which is ugly. I had a look at the RA to see what it does. After trying to stop the domain via "virsh shutdown ..." in a configurable time it switches to "virsh

Re: [ClusterLabs] mess in the CIB

2020-10-07 Thread Lentes, Bernd
> > It's unlikely that changed at any time; more likely it was created like > that. Whatever was used to create the initial configuration would be > where to look for clues. > > As long as the IDs are unique, their content doesn't matter to > pacemaker, so it's just a cosmetic issue. > What

[ClusterLabs] mess in the CIB

2020-10-06 Thread Lentes, Bernd
Hi guys, i have a very strange problem with my CIB. We have a two-node cluster running about 15 VirtualDomains as resources. Two of them seem to be messed up. Here is the config from crm: primitive vm_ssh VirtualDomain \ params config="/mnt/share/vm_ssh.xml" \ params

Re: [ClusterLabs] VirtualDomain stop operation traced - but nothing appears in /var/lib/heartbeat/trace_ra/

2020-10-02 Thread Lentes, Bernd
- Am 30. Sep 2020 um 19:24 schrieb Vladislav Bogdanov bub...@hoster-ok.com: > Hi > Try to enable trace_ra for start op. I'm tracing now start and stop and that works fine.

[ClusterLabs] VirtualDomain stop operation traced - but nothing appears in /var/lib/heartbeat/trace_ra/

2020-09-28 Thread Lentes, Bernd
Hi, currently i have a VirtualDomains resource which sometimes fails to stop. To investigate further i'm tracing the stop operation of this resource. But although i stopped it already now several times, nothing appears in /var/lib/heartbeat/trace_ra/. This is my config: primitive vm_amok

Re: [ClusterLabs] why is node fenced ?

2020-08-19 Thread Lentes, Bernd
- On Aug 19, 2020, at 4:04 PM, kgaillot kgail...@redhat.com wrote: >> This appears to be a scheduler bug. > > Fix is in master branch and will land in 2.0.5 expected at end of the > year > > https://github.com/ClusterLabs/pacemaker/pull/2146 A principal question: I have SLES 12 and i'm

Re: [ClusterLabs] why is node fenced ?

2020-08-19 Thread Lentes, Bernd
- On Aug 18, 2020, at 7:30 PM, kgaillot kgail...@redhat.com wrote: >> > I'm not sure, I'd have to see the pe input. >> >> You find it here: >> https://hmgubox2.helmholtz-muenchen.de/index.php/s/WJGtodMZ9k7rN29 > > This appears to be a scheduler bug. > > The scheduler considers a

Re: [ClusterLabs] why is node fenced ?

2020-08-18 Thread Lentes, Bernd
- On Aug 17, 2020, at 5:09 PM, kgaillot kgail...@redhat.com wrote: >> I checked all relevant pe-files in this time period. >> This is what i found out (i just write the important entries): >> Executing cluster transition: >> * Resource action: vm_nextcloudstop on ha-idg-2 >> Revised

Re: [ClusterLabs] why is node fenced ?

2020-08-14 Thread Lentes, Bernd
- On Aug 9, 2020, at 10:17 PM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: >> So this appears to be the problem. From these logs I would guess the >> successful stop on ha-idg-1 did not get written to the CIB for some >> reason. I'd look at the pe input from this transition on

Re: [ClusterLabs] why is node fenced ?

2020-08-14 Thread Lentes, Bernd
- On Aug 10, 2020, at 11:59 PM, kgaillot kgail...@redhat.com wrote: > The most recent transition is aborted, but since all its actions are > complete, the only effect is to trigger a new transition. > > We should probably rephrase the log message. In fact, the whole > "transition"

Re: [ClusterLabs] why is node fenced ?

2020-08-10 Thread Lentes, Bernd
- Am 29. Jul 2020 um 18:53 schrieb kgaillot kgail...@redhat.com: > On Wed, 2020-07-29 at 17:26 +0200, Lentes, Bernd wrote: >> Hi, >> >> a few days ago one of my nodes was fenced and i don't know why, which >> is something i really don't like. >> What i

Re: [ClusterLabs] why is node fenced ?

2020-08-10 Thread Lentes, Bernd
- Am 29. Jul 2020 um 18:53 schrieb kgaillot kgail...@redhat.com: > Since the ha-idg-2 is now shutting down, ha-idg-1 becomes DC. The other way round. >> Jul 20 17:05:33 [10690] ha-idg-2pengine: warning: >> unpack_rsc_op_failure: Processing failed migrate_to of vm_nextcloud >> on

Re: [ClusterLabs] Antw: Re: Antw: [EXT] why is node fenced ?

2020-07-31 Thread Lentes, Bernd
- On Jul 31, 2020, at 8:03 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: >>> >>> My guess is that ha-idg-1 was fenced because a failed migration from >> ha-idg-2 >>> is treated like a stop failure on ha-idg-2. Stop failures cause fencing. > You >>> should have tested your

Re: [ClusterLabs] Antw: [EXT] why is node fenced ?

2020-07-30 Thread Lentes, Bernd
- Am 30. Jul 2020 um 9:28 schrieb Ulrich Windl ulrich.wi...@rz.uni-regensburg.de: >>>> "Lentes, Bernd" schrieb am 29.07.2020 > um > 17:26 in Nachricht > <1894379294.27456141.1596036406000.javamail.zim...@helmholtz-muenchen.de>: >> Hi, >> &

Re: [ClusterLabs] why is node fenced ?

2020-07-29 Thread Lentes, Bernd
- Am 29. Jul 2020 um 17:26 schrieb Bernd Lentes bernd.len...@helmholtz-muenchen.de: Hi, sorry, i missed: OS: SLES 12 SP4 kernel: 4.12.14-95.32 pacmaker: pacemaker-1.1.19+20181105.ccd6b5b10-3.13.1.x86_64 Bernd Helmholtz Zentrum München Helmholtz Zentrum Muenchen Deutsches

[ClusterLabs] why is node fenced ?

2020-07-29 Thread Lentes, Bernd
Hi, a few days ago one of my nodes was fenced and i don't know why, which is something i really don't like. What i did: I put one node (ha-idg-1) in standby. The resources on it (most of all virtual domains) were migrated to ha-idg-2, except one domain (vm_nextcloud). On ha-idg-2 a mountpoint

[ClusterLabs] pacemaker together with ovirt or Kimchi ?

2020-07-11 Thread Lentes, Bernd
Hi, i'm having a two node cluster with pacemaker and about 10 virtual domains as resources. It's running fine. I configure/administrate everything with the crm shell. But i'm also looking for a web interface. I'm not much impressed by HAWK. Is it possible to use Kimchi or ovirt together with a

Re: [ClusterLabs] Antw: DLM, cLVM, GFS2 and OCFS2 managed by systemd instead of crm ?

2019-10-16 Thread Lentes, Bernd
- On Oct 16, 2019, at 8:27 AM, Digimer li...@alteeve.ca wrote: > On 2019-10-16 2:16 a.m., Ulrich Windl wrote: >>>>> "Lentes, Bernd" schrieb am 15.10.2019 >> um >> 21:35 in Nachricht >> <1922568650.3402980.1571168140600.javamail.zim...@helm

[ClusterLabs] DLM, cLVM, GFS2 and OCFS2 managed by systemd instead of crm ?

2019-10-15 Thread Lentes, Bernd
Hi, i'm a big fan of simple solutions (KISS). Currently i have DLM, cLVM, GFS2 and OCFS2 managed by pacemaker. They all are fundamental prerequisites for my resources (Virtual Domains). To configure them i used clones and groups. Why not having them managed by systemd to make the cluster setup

Re: [ClusterLabs] trace of Filesystem RA does not log

2019-10-14 Thread Lentes, Bernd
>> -Original Message- >> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Lentes, >> Bernd >> Sent: 2019年10月11日 22:32 >> To: Pacemaker ML >> Subject: [ClusterLabs] trace of Filesystem RA does not log >> >> Hi, >&

Re: [ClusterLabs] trace of Filesystem RA does not log

2019-10-14 Thread Lentes, Bernd
- On Oct 14, 2019, at 6:27 AM, Roger Zhou zz...@suse.com wrote: > The stop failure is very bad, and is crucial for HA system. Yes, that's true. > You can try o2locktop cli to find the potential INODE to be blamed[1]. > > `o2locktop --help` gives you more usage details I will try that.

  1   2   3   >