> -Original Message-
> From: Reid Wahl
> Sent: Friday, March 10, 2023 10:30 PM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> ; Lentes, Bernd muenchen.de>
> Subject: Re: [ClusterLabs] 2-Node cluster - both nodes unclean - can't
> st
Hi,
I don’t get my cluster running. I had problems with an OCFS2 Volume, both
nodes have been fenced.
When I do now a “systemctl start pacemaker.service”, crm_mon shows for a few
seconds both nodes as UNCLEAN, then pacemaker stops.
I try to confirm the fendcing with “Stonith_admin –C”, but it
Hi,
i think i found the reason, but i want to be sure.
I wanted to stop a VirtualDomain and did a "crm resource stop ..."
But it didn't shut down. After waiting several minutes i stopped with libvirt,
circumventing the cluster software.
First i wondered "why didn't it shutdown ?", but then i
- On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com wrote:
> On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [
> mailto:users@clusterlabs.org | users@clusterlabs.org ] > wrote:
> Did you try a cleanup in between?
When i do a cleanup before trace/untrace the resource is
- On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com wrote:
> This turned out to be interesting.
>
> In the first case, the resource history contains a start action and a
> recurring monitor. The parameters to both change, so the resource
> requires a restart.
>
> In the second
- On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com wrote:
> This turned out to be interesting.
>
> In the first case, the resource history contains a start action and a
> recurring monitor. The parameters to both change, so the resource
> requires a restart.
>
> In the second
- On 18 Oct, 2022, at 14:35, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de
wrote:
>
> # crm configure
> edit ...
> verify
> ptest nograph #!!!
> commit
That's very helpful. I didn't know that, Thanks.
> --
> If you used that, you would have seen the restart.
> Despite of that I wonder
Hi,
i try to find out why there is sometimes a restart of the resource and
sometimes not.
Unpredictable behaviour is someting i expect from Windows, not from Linux.
Here you see two "crm resource trace "resource"".
In the first case the resource is restarted , in the second not.
The command i
- On 7 Oct, 2022, at 21:37, Reid Wahl nw...@redhat.com wrote:
> On Fri, Oct 7, 2022 at 6:02 AM Lentes, Bernd
> wrote:
>> - On 7 Oct, 2022, at 01:18, Reid Wahl nw...@redhat.com wrote:
>>
>> > How did you set a trace just for monitor?
>>
>> crm resour
- On 7 Oct, 2022, at 01:08, Ken Gaillot kgail...@redhat.com wrote:
>
> Yes, trace_ra is an agent-defined resource parameter, not a Pacemaker-
> defined meta-attribute. Resources are restarted anytime a parameter
> changes (unless the parameter is set up for reloads).
>
> trace_ra is unusual
Dear all,
while checking my cluster with "crm status xml" i stumbled across:
ha-idg-1:/usr/lib/ocf/resource.d/heartbeat # crm status xml
<=
expected_votes="unknown" ?
I didn't
- On 7 Oct, 2022, at 01:08, Ken Gaillot kgail...@redhat.com wrote:
>
> Yes, trace_ra is an agent-defined resource parameter, not a Pacemaker-
> defined meta-attribute. Resources are restarted anytime a parameter
> changes (unless the parameter is set up for reloads).
>
> trace_ra is
- On 7 Oct, 2022, at 01:18, Reid Wahl nw...@redhat.com wrote:
> How did you set a trace just for monitor?
crm resource trace dlm monitor.
> Wish I could help with that -- it's mostly a mystery to me too ;)
:-))
smime.p7s
Description: S/MIME Cryptographic Signature
Hi,
i have some problems with our DLM, so i wanted to trace it. Yesterday i just
set a trace for "monitor". No restart of DLM afterwards. It went fine as
expected.
I got logs in /var/lib/heartbeat/trace_ra. After some monitor i stopped tracing.
Today i set a trace for all operations.
Now
- On 24 Aug, 2022, at 16:26, kwenning kwenn...@redhat.com wrote:
>>
>> if I get Ulrich right - and my fading memory of when I really used crmsh the
>> last time is telling me the same thing ...
>>
I get the impression many people prefer pcs to crm. Is there any reason for
that ?
And can i
- On 24 Aug, 2022, at 16:26, kwenning kwenn...@redhat.com wrote:
>
> Guess the resources running now are those you tried to enable before
> while they were globally stopped
>
No. First i set stop-all-resources to false. Then SOME resources started.
Then i tried several times to
- On 24 Aug, 2022, at 16:01, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de
wrote:
>> Now with "crm resource start" all resources started. I didn't change
>> anything !?!
>
> I guess that command set the roles of all resources to "started", so you
> changed
> something ;-)
I did it
Hi,
Now with "crm resource start" all resources started. I didn't change anything
!?!
Bernd
smime.p7s
Description: S/MIME Cryptographic Signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home:
- On 24 Aug, 2022, at 07:21, Reid Wahl nw...@redhat.com wrote:
> As a result, your command might start the virtual machines, but
> Pacemaker will still show that the resources are "Stopped (disabled)".
> To fix that, you'll need to enable the resources.
How do i achieve that ?
Bernd
- On 24 Aug, 2022, at 08:17, Reid Wahl nw...@redhat.com wrote:
> I'm not sure off the top of my head what (if anything) gets sent to
> the logs. Do note that Bernd is using pacemaker v1, which hasn't been
> receiving new features for quite a while.
An Update is recommended ?
Bernd
- On 24 Aug, 2022, at 08:10, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de
wrote:
>
> Bernd,
>
> that command would simply set the role to "started", but I guess it already
> is.
> Obviously to be effective stop-all-resources must habve precedence. You see?
>
> Regards,
> Ulrich
>
> There is no resource with the name "virtual_domain" in your list. All
> non-active resources in your list are either disabled or unmanaged.
> Without actual commands that list resource state before "crm resource
> start", "crm resource start" itself and once more resource state after
> this
- On 24 Aug, 2022, at 07:22, Reid Wahl nw...@redhat.com wrote:
> Are the VMs running after your start command?
No.
Bernd
smime.p7s
Description: S/MIME Cryptographic Signature
___
Manage your subscription:
- On 24 Aug, 2022, at 07:03, arvidjaar arvidj...@gmail.com wrote:
> On 24.08.2022 07:34, Lentes, Bernd wrote:
>>
>>
>> - On 24 Aug, 2022, at 05:33, Reid Wahl nw...@redhat.com wrote:
>>
>>
>>> The stop-all-resources cluster property is set to
- On 24 Aug, 2022, at 05:33, Reid Wahl nw...@redhat.com wrote:
> The stop-all-resources cluster property is set to true. Is that intentional?
OMG. Thanks Reid !
But unfortunately not all virtual domains are running:
Stack: corosync
Current DC: ha-idg-2 (version
- On 24 Aug, 2022, at 04:04, Reid Wahl nw...@redhat.com wrote:
> Can you share your CIB? Not sure off hand what everything means (resource not
> found, IPC error, crmd failure and respawn), and pacemaker v1 logs aren't the
> easiest to interpret. But perhaps something in the CIB will show
Hi,
currently i can't start resources on our 2-node-cluster.
Cluster seems to be ok:
Stack: corosync
Current DC: ha-idg-1 (version
1.1.24+20210811.f5abda0ee-3.21.9-1.1.24+20210811.f5abda0ee) - partition with
quorum
Last updated: Wed Aug 24 02:56:46 2022
Last change: Wed Aug 24 02:56:41 2022 by
- On 4 Aug, 2022, at 19:46, Reid Wahl nw...@redhat.com wrote:
>> It shuts down ha-idg-2:
>> 2022-08-03T01:19:51.866200+02:00 ha-idg-2 systemd-logind[1535]: Power key
>> pressed.
>> 2022-08-03T01:19:52.048335+02:00 ha-idg-2 systemd-logind[1535]: System is
>> powering down.
>>
- On 4 Aug, 2022, at 15:14, arvidjaar arvidj...@gmail.com wrote:
> On 04.08.2022 16:06, Lentes, Bernd wrote:
>>
>> - On 4 Aug, 2022, at 00:27, Reid Wahl nw...@redhat.com wrote:
>>
>> Would do you mean by "banned" ? "crm resource ban ..." ?
- On 4 Aug, 2022, at 00:27, Reid Wahl nw...@redhat.com wrote:
>
> Such constraints are unnecessary.
>
> Let's say we have two stonith devices called "fence_dev1" and
> "fence_dev2" that fence nodes 1 and 2, respectively. If node 2 needs
> to be fenced, and fence_dev2 is running on node 2,
Hi,
i have the following situation:
2-node Cluster, just one node running (ha-idg-1).
The second node (ha-idg-2) is in standby. DLM monitor on ha-idg-1 times out.
Cluster tries to restart all services depending on DLM:
Aug 03 01:07:11 [19367] ha-idg-1pengine: notice: LogAction:*
Hi,
i have a strange behaviour found in the cluster log
(/var/log/cluster/corosync.log).
I KNOW that i put one node (ha-idg-2) in standby mode and stopped the pacemaker
service on that node:
The history of the shell says:
993 2022-08-02 18:28:25 crm node standby ha-idg-2
994 2022-08-02
- On 26 Jul, 2022, at 20:06, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de
wrote:
> Hi Bernd!
>
> I think the answer may be some time before the timeout was reported; maybe a
> network issue? Or a very high load. It's hard to say from the logs...
Yes, i had a high load before:
Jul 20
Hi,
it seems my DLM went grazy:
/var/log/cluster/corosync.log:
Jul 20 00:21:56 [12204] ha-idg-1 lrmd: warning: child_timeout_callback:
dlm_monitor_3 process (PID 11816) timed out
Jul 20 00:21:56 [12204] ha-idg-1 lrmd: warning: operation_finished:
dlm_monitor_3:11816
Hi,
is there a way to cancel a running live migration or a "resource stop" ?
Bernd
--
Bernd Lentes
System Administrator
Institute for Metabolism and Cell Death (MCD)
Building 25 - office 122
HelmholtzZentrum München
bernd.len...@helmholtz-muenchen.de
phone: +49 89 3187 1241
+49 89
- On Jun 27, 2022, at 3:54 PM, kgaillot kgail...@redhat.com wrote:
> As an aside, the preferred naming for custom agents is to change the
> provider (ocf:PROVIDER:AGENT), putting them in
> /usr/lib/ocf/resource.d/PROVIDER/AGENT.
>
> For example, ocf:local:VirtualDomain or
- On Jun 27, 2022, at 2:57 PM, Oyvind Albrigtsen oalbr...@redhat.com wrote:
> You need to update the agent name in the metadata section to be the
> same as the filename.
>
>
> Oyvind
>
OMG. Thank you !!!
Bernd
smime.p7s
Description: S/MIME Cryptographic Signature
Hi,
i adapted the RA ocf/heartbeat/VirtualDomain to my needs and renamed it to
VirtualDomain.ssh
When i try to use it now, i get an error message.
I start e.g. "crm configure edit vm-idcc-devel" to modify an existing
VirtualDomain that it uses the new RA
and want to save it i get the following
Hi,
from my understanding the resource agents in /usr/lib/ocf/resource.d/heartbeat
are quite similar
to the old scripts in /etc/init.d started by init.
Init starts these scripts with "script [start|stop|reload|restart|status]".
Inside the script there is a case construct which checks the options
- On Feb 17, 2022, at 4:25 PM, kgaillot kgail...@redhat.com wrote:
>> So for me the big question is:
>> When a transition is happening, and there is a change in the cluster,
>> is the transition "aborted"
>> (delayed or interrupted would be better) or not ?
>> Is this behaviour consistent ?
- On Feb 16, 2022, at 6:48 PM, arvidjaar arvidj...@gmail.com wrote:
>
>
> Splitting logs between different messages does not really help in interpreting
> them.
I agree.
Here is the complete excerpt from the respective time:
https://nc-mcd.helmholtz-muenchen.de/nextcloud/s/eY8SA8pe4HZBBc8
- On Feb 17, 2022, at 10:26 AM, Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de wrote:
"Ulrich Windl" schrieb am 17.02.2022
>
> To correct myself: crm was a "-w" (wait) option that will wait until the DC is
> idle. In most cases it just waits until the operation requeszted has
- On Feb 16, 2022, at 12:52 AM, kgaillot kgail...@redhat.com wrote:
> A transition is the set of actions that need to be taken in response to
> current conditions. A transition is aborted any time conditions change
> (here, the target-role being changed in the configuration), so that a
>
- On Feb 16, 2022, at 1:01 PM, Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de wrote:
> Bernd,
>
> I guess the syslog(/journal of the DC has better logs.
Unfortunately the journal didn't reveal something.
> As I see it now, it seems stop of vm_pathway takes a few minutes, and no other
>
- On Feb 16, 2022, at 12:52 AM, kgaillot kgail...@redhat.com wrote:
>> Any idea ?
>> What is about that transition 128, which is aborted ?
>
> A transition is the set of actions that need to be taken in response to
> current conditions. A transition is aborted any time conditions change
>
Hi,
i have a weird behaviour in my 2-node-cluster.
I stopped several VirtualDomains via "crm resource stop VirtualDomain", but the
respective shutdown starts minutes later.
All on the same host.
.bash_history:
3520 2022-02-15 20:55:44 crm resource stop vm_greensql
3521 2022-02-15 20:56:34
- On Feb 10, 2022, at 4:40 PM, Jehan-Guillaume de Rorthais j...@dalibo.com
wrote:
>
> I wonder if after the cluster shutdown complete, the target-role=Stopped could
> be removed/edited offline with eg. crmadmin? That would make VirtualDomain
> startable on boot.
>
> I suppose this would
- On Feb 9, 2022, at 11:26 AM, Jehan-Guillaume de Rorthais j...@dalibo.com
wrote:
>
> I'm not sure how "crm resource stop " actually stop a resource. I thought
> it would set "target-role=Stopped", but I might be wrong.
>
> If "crm resource stop" actually use "target-role=Stopped", I
- On Feb 7, 2022, at 4:13 PM, Jehan-Guillaume de Rorthais j...@dalibo.com
wrote:
> On Mon, 7 Feb 2022 14:24:44 +0100 (CET)
> "Lentes, Bernd" wrote:
>
>> Hi,
>>
>> i'm currently changing a bit in my cluster because i realized that my
>> configu
- On Feb 7, 2022, at 2:36 PM, Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de wrote:
>
> Bernd,
>
> what if you set the node affected to standby, or shut down the cluster
> services? Or all all nodes powered by the same UPS?
All nodes are powered by the same UPS.
>
>
>>
>> And what
Hi,
i'm currently changing a bit in my cluster because i realized the my
configuration for a power outtage didn't work as i expected.
My idea is currently:
- first stop about 20 VirtualDomains, which are my services. This will surely
takes some minutes.
I'm thinking of stopping each with a time
Hi,
i need to write some scripts for our cluster. Until now i wrote bash scripts.
But i like to learn python. Is there a package for pacemaker ?
What i found is: https://pypi.org/project/pacemaker/ and i'm not sure what that
is.
Thanks.
Bernd
--
Bernd Lentes
System Administrator
Institute
Hi,
we just experienced two power outages in a few days.
This showed me that our UPS configuration and the handling of resources on the
cluster is insufficient.
We have a two-node cluster with SLES 12 SP5 and a Smart-UPS SRT 3000 from APC
with Network Management Card.
The UPS is able to buffer
- On Sep 30, 2021, at 3:55 AM, Gang He g...@suse.com wrote:
>>
>> 1) No problems during this step, the procedure just needs a few seconds.
>> reflink is a binary. See reflink --help
>> Yes, it is a cluster filesystem. I do the procedure just on one node,
>> so i don't have duplicates.
>>
- On Sep 29, 2021, at 4:37 AM, Gang He g...@suse.com wrote:
> Hi Lentes,
>
> Thank for your feedback.
> I have some questions as below,
> 1) how to clone these VM images from each ocfs2 nodes via reflink?
> do you encounter any problems during this step?
> I want to say, this is a shared
- On Sep 27, 2021, at 2:51 PM, Pacemaker ML users@clusterlabs.org wrote:
> I would use something liek this:
>
> ionice -c 2 -n 7 nice cp XXX YYY
>
> Best Regards,
> Strahil Nikolov
Just for a better understanding:
ionice does not relate to the copy procedure in this commandline, but to
- On Sep 27, 2021, at 2:51 PM, Pacemaker ML users@clusterlabs.org wrote:
> I would use something liek this:
>
> ionice -c 2 -n 7 nice cp XXX YYY
>
> Best Regards,
> Strahil Nikolov
>
Hi Strahil,
that sounds interesting, i didn't know ionice.
I will have a look on the man-pages.
Hi,
i have a two-node cluster running on SLES 12SP5 with two HP servers and a
common FC SAN.
Most of my resources are virtual domains offering databases and web pages.
The disks from the domains reside on a OCFS2 Volume on a FC SAN.
Each night a 9pm all domains are snapshotted with the OCFS2
- On Sep 20, 2021, at 10:14 PM, kgaillot kgail...@redhat.com wrote:
>
> As far as I know, only a few of the ocf:pacemaker agents support OCF
> 1.1 currently. The resource-agents package doesn't.
>
> To check a given agent, run "crm_resource --show-metadata
> ocf:$PROVIDER:$AGENT | grep ''"
- On Sep 18, 2021, at 1:19 AM, kgaillot kgail...@redhat.com wrote:
>
> If the agent meta-data advertises support for the 1.1 standard and
> indicates that the trace_ra parameter is reloadable, then Pacemaker
> will automatically do a reload instead of restart for the resource if
> the
- On Sep 17, 2021, at 9:13 PM, kgaillot kgail...@redhat.com wrote:
>> Bernd
>
> Tracing works by setting a special parameter, which to pacemaker looks
> like a configuration change that requires a restart. With the new OCF
> 1.1 standard, the trace parameter could be marked reloadable, but
Hi,
today i configured tracing for some VirtualDomains:
ha-idg-2:~ # crm resource trace vm_documents-oo migrate_from
INFO: Trace for vm_documents-oo:migrate_from is written to
/var/lib/heartbeat/trace_ra/
INFO: Trace set, restart vm_documents-oo to trace the migrate_from operation
ha-idg-2:~ #
Hi,
Today i couldn't migrate several virtual domains. I have a Two-node cluster
with SuE SLES 12 SP5.
Pacemaker is
pacemaker-1.1.23+20200622.28dd98fad-3.9.2.20591.0.PTF.1177212.x86_64, corosync
is corosync-2.3.6-9.13.1.x86_64.
Migration just stopped after an amount of time.
This is what i
enkov
>> wrote:
>> On 30.03.2021 18:16, Lentes, Bernd wrote:
>> > Hi,
>>> currently i'm reading "Mastering KVM Virtualization", published by Packt
>> > Publishing, a book i can really recommend.
>>> There are some proposals for tuning guests. One
Hi,
currently i'm reading "Mastering KVM Virtualization", published by Packt
Publishing, a book i can really recommend.
There are some proposals for tuning guests. One is KSM (kernel samepage
merging), which sounds quite interesting.
Especially in a system with lots of virtual machines with the
- On Feb 15, 2021, at 10:24 PM, Bernd Lentes
bernd.len...@helmholtz-muenchen.de wrote:
> - On Feb 15, 2021, at 9:00 PM, kgaillot kgail...@redhat.com wrote:
>
>> On Mon, 2021-02-15 at 20:47 +0100, Lentes, Bernd wrote:
>>> - On Feb 15, 2021, at 4:53 PM, kgaillo
- On Feb 15, 2021, at 9:00 PM, kgaillot kgail...@redhat.com wrote:
> On Mon, 2021-02-15 at 20:47 +0100, Lentes, Bernd wrote:
>> - On Feb 15, 2021, at 4:53 PM, kgaillot kgail...@redhat.com
>> wrote:
>>
>> > I'd check for SELinux denials.
>&g
- On Feb 15, 2021, at 4:53 PM, kgaillot kgail...@redhat.com wrote:
> I'd check for SELinux denials.
>
SELinux isn't installed and the AppArmor service does not start.
I changed the subject.
Bernd
Helmholtz Zentrum München
Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer
- On Feb 15, 2021, at 9:55 AM, Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de wrote:
>> Hi,
>>
>> i could configure the following:
>>
>> ha-idg-1:~ # crm configure show smtp_alert
>> alert smtp_alert "/root/skripte/alert_smtp.sh" \
>> attributes
- On Feb 12, 2021, at 12:50 PM, Yan Gao y...@suse.com wrote:
>
>
> It seems that crmsh has difficulty parsing the "random" ids of the
> attribute sets here. I guess `crm configure edit` the part to be
> something like:
>
> alert smtp_alert "/root/skripte/alert_smtp.sh" \
>
- On Feb 12, 2021, at 5:00 PM, hunter86 bg hunter86...@yahoo.com wrote:
> WARNING: cib-bootstrap-options: unknown attribute 'no-quirum-policy'
> That looks like a typo.
> Best Regards,
> Strahil Nikolov
Thanks, i found that already.
Bernd
Helmholtz Zentrum München
Helmholtz Zentrum
- On Feb 12, 2021, at 11:18 AM, Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de wrote:
>
> What is the output of "crm configure verify"?
ha-idg-1:~ # crm configure verify
WARNING: cib-bootstrap-options: unknown attribute 'no-quirum-policy'
WARNING: clvmd: specified timeout 20 for monitor
- On Oct 26, 2020, at 4:09 PM, Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de wrote:
>
> AFAIK you can even kill processes in Linux that are in "D" state (contrary to
> other operating systems).
How ?
Bernd
Helmholtz Zentrum München
Helmholtz Zentrum Muenchen
Deutsches
- On Oct 23, 2020, at 11:18 PM, Bernd Lentes
bernd.len...@helmholtz-muenchen.de wrote:
> - On Oct 23, 2020, at 11:11 PM, arvidjaar arvidj...@gmail.com wrote:
>
>
>>> I need someting like that which waits for some time (maybe 30s) if the
>>> domain
>>> nevertheless stops although
>>>
- On Oct 26, 2020, at 8:41 AM, Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de wrote:
> "SIGKILL: Device or resource busy" is nonsense: kill does not wait; it either
> fails or succeeds.
yes and no. When you send a SIGKILL to a process which is in 'D' state, the
signal can't be delivered,
- On Oct 23, 2020, at 11:11 PM, arvidjaar arvidj...@gmail.com wrote:
>> I need someting like that which waits for some time (maybe 30s) if the domain
>> nevertheless stops although
>> "virsh destroy" gaves an error back. Because the SIGKILL is delivered if the
>> process wakes up from D
- On Oct 23, 2020, at 8:45 PM, Valentin Vidić vvi...@valentin-vidic.from.hr
wrote:
> On Fri, Oct 23, 2020 at 08:08:31PM +0200, Lentes, Bernd wrote:
>> But when the timeout has run out the RA tries to kill the machine with a
>> "virsh
>> destroy".
>
- On Oct 23, 2020, at 5:06 PM, Strahil Nikolov hunter86...@yahoo.com wrote:
> why don't you work with something like this: 'op stop interval =300
> timeout=600'.
> The stop operation will timeout at your requirements without modifying the
> script.
>
> Best Regards,
> Strahil Nikolov
But
- On Oct 23, 2020, at 7:11 AM, Andrei Borzenkov arvidj...@gmail.com wrote:
>>
>> ocf_log info "Issuing forced shutdown (destroy) request for domain
>> ${DOMAIN_NAME}."
>> out=$(LANG=C virsh $VIRSH_OPTIONS destroy ${DOMAIN_NAME} 2>&1)
>> ex=$?
>> sleep (10)
Hi guys,
ocassionally stopping a VirtualDomain resource via "crm resource stop" does not
work, and in the end the node is fenced, which is ugly.
I had a look at the RA to see what it does. After trying to stop the domain via
"virsh shutdown ..." in a configurable time it switches to "virsh
>
> It's unlikely that changed at any time; more likely it was created like
> that. Whatever was used to create the initial configuration would be
> where to look for clues.
>
> As long as the IDs are unique, their content doesn't matter to
> pacemaker, so it's just a cosmetic issue.
>
What
Hi guys,
i have a very strange problem with my CIB.
We have a two-node cluster running about 15 VirtualDomains as resources.
Two of them seem to be messed up.
Here is the config from crm:
primitive vm_ssh VirtualDomain \
params config="/mnt/share/vm_ssh.xml" \
params
- Am 30. Sep 2020 um 19:24 schrieb Vladislav Bogdanov bub...@hoster-ok.com:
> Hi
> Try to enable trace_ra for start op.
I'm tracing now start and stop and that works fine.
Hi,
currently i have a VirtualDomains resource which sometimes fails to stop.
To investigate further i'm tracing the stop operation of this resource.
But although i stopped it already now several times, nothing appears in
/var/lib/heartbeat/trace_ra/.
This is my config:
primitive vm_amok
- On Aug 19, 2020, at 4:04 PM, kgaillot kgail...@redhat.com wrote:
>> This appears to be a scheduler bug.
>
> Fix is in master branch and will land in 2.0.5 expected at end of the
> year
>
> https://github.com/ClusterLabs/pacemaker/pull/2146
A principal question:
I have SLES 12 and i'm
- On Aug 18, 2020, at 7:30 PM, kgaillot kgail...@redhat.com wrote:
>> > I'm not sure, I'd have to see the pe input.
>>
>> You find it here:
>> https://hmgubox2.helmholtz-muenchen.de/index.php/s/WJGtodMZ9k7rN29
>
> This appears to be a scheduler bug.
>
> The scheduler considers a
- On Aug 17, 2020, at 5:09 PM, kgaillot kgail...@redhat.com wrote:
>> I checked all relevant pe-files in this time period.
>> This is what i found out (i just write the important entries):
>> Executing cluster transition:
>> * Resource action: vm_nextcloudstop on ha-idg-2
>> Revised
- On Aug 9, 2020, at 10:17 PM, Bernd Lentes
bernd.len...@helmholtz-muenchen.de wrote:
>> So this appears to be the problem. From these logs I would guess the
>> successful stop on ha-idg-1 did not get written to the CIB for some
>> reason. I'd look at the pe input from this transition on
- On Aug 10, 2020, at 11:59 PM, kgaillot kgail...@redhat.com wrote:
> The most recent transition is aborted, but since all its actions are
> complete, the only effect is to trigger a new transition.
>
> We should probably rephrase the log message. In fact, the whole
> "transition"
- Am 29. Jul 2020 um 18:53 schrieb kgaillot kgail...@redhat.com:
> On Wed, 2020-07-29 at 17:26 +0200, Lentes, Bernd wrote:
>> Hi,
>>
>> a few days ago one of my nodes was fenced and i don't know why, which
>> is something i really don't like.
>> What i
- Am 29. Jul 2020 um 18:53 schrieb kgaillot kgail...@redhat.com:
> Since the ha-idg-2 is now shutting down, ha-idg-1 becomes DC.
The other way round.
>> Jul 20 17:05:33 [10690] ha-idg-2pengine: warning:
>> unpack_rsc_op_failure: Processing failed migrate_to of vm_nextcloud
>> on
- On Jul 31, 2020, at 8:03 AM, Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de wrote:
>>>
>>> My guess is that ha-idg-1 was fenced because a failed migration from
>> ha-idg-2
>>> is treated like a stop failure on ha-idg-2. Stop failures cause fencing.
> You
>>> should have tested your
- Am 30. Jul 2020 um 9:28 schrieb Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de:
>>>> "Lentes, Bernd" schrieb am 29.07.2020
> um
> 17:26 in Nachricht
> <1894379294.27456141.1596036406000.javamail.zim...@helmholtz-muenchen.de>:
>> Hi,
>>
&
- Am 29. Jul 2020 um 17:26 schrieb Bernd Lentes
bernd.len...@helmholtz-muenchen.de:
Hi,
sorry, i missed:
OS: SLES 12 SP4
kernel: 4.12.14-95.32
pacmaker: pacemaker-1.1.19+20181105.ccd6b5b10-3.13.1.x86_64
Bernd
Helmholtz Zentrum München
Helmholtz Zentrum Muenchen
Deutsches
Hi,
a few days ago one of my nodes was fenced and i don't know why, which is
something i really don't like.
What i did:
I put one node (ha-idg-1) in standby. The resources on it (most of all virtual
domains) were migrated to ha-idg-2,
except one domain (vm_nextcloud). On ha-idg-2 a mountpoint
Hi,
i'm having a two node cluster with pacemaker and about 10 virtual domains as
resources.
It's running fine.
I configure/administrate everything with the crm shell.
But i'm also looking for a web interface.
I'm not much impressed by HAWK.
Is it possible to use Kimchi or ovirt together with a
- On Oct 16, 2019, at 8:27 AM, Digimer li...@alteeve.ca wrote:
> On 2019-10-16 2:16 a.m., Ulrich Windl wrote:
>>>>> "Lentes, Bernd" schrieb am 15.10.2019
>> um
>> 21:35 in Nachricht
>> <1922568650.3402980.1571168140600.javamail.zim...@helm
Hi,
i'm a big fan of simple solutions (KISS).
Currently i have DLM, cLVM, GFS2 and OCFS2 managed by pacemaker.
They all are fundamental prerequisites for my resources (Virtual Domains).
To configure them i used clones and groups.
Why not having them managed by systemd to make the cluster setup
>> -Original Message-
>> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Lentes,
>> Bernd
>> Sent: 2019年10月11日 22:32
>> To: Pacemaker ML
>> Subject: [ClusterLabs] trace of Filesystem RA does not log
>>
>> Hi,
>&
- On Oct 14, 2019, at 6:27 AM, Roger Zhou zz...@suse.com wrote:
> The stop failure is very bad, and is crucial for HA system.
Yes, that's true.
> You can try o2locktop cli to find the potential INODE to be blamed[1].
>
> `o2locktop --help` gives you more usage details
I will try that.
1 - 100 of 232 matches
Mail list logo