Re: [ClusterLabs] Fwd: dlm not starting
On 05/02/16 14:22 -0500, nameless wrote: > Hi Hello >nameless<, technical aspect aside, it goes without saying that engaging in a community assumes some level of cultural and social compatibility. Otherwise there is a danger the cluster will partition, and that would certainly be unhelpful. Maybe this is a misunderstanding on my side, but so far, you don't appear compatible, maturity-wise. Happy Friday, ! -- Jan (Poki) pgpuDRk12fVfi.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Two bugs in fence_ec2 script
I've found two bugs in the fence_ec2 script found in this RPM: fence_ec2-0.1-0.10.1.x86_64.rpm I believe this is the latest and greatest, but I may be wrong. I'm not sure where to post these fixes, so I'm starting here. If there is a better place to post, please let me know. The errors are: 1. The tag attribute does not default to "Name", as specified in the documentation 2. The script does not accept attribute settings from stdin #1 is not critical, since tag=Name can be specified explicitly when setting up a stonith resource. However, #2 is very important, since stonith_admin (and I think most of pacemaker) passes arguments to fencing scripts via stdin. Without this fix, fence_ec2 will not work properly via pacemaker, that is: This command works fence_ec2 --action=metadata ...but this alternate version of same command does not: echo "action=metadata" | fence_ec2 The fixes are relatively trivial. The version of fence_ec2 from the RPM is attached as fence_ec2.old. My modified version is attached as fence_ec2.new. I've also attached the RPM that was the source for fence_ec2.old. fence_ec2-0.1-0.10.1.x86_64.rpm Description: application/rpm fence_ec2.old Description: Binary data fence_ec2.new Description: Binary data ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] "After = syslog.service" it is not working?
On 01/28/2016 10:27 PM, 飯田 雄介 wrote: > Hi, Ken > > I have to get the log. > shutdown the -r now ran to "03:00:22". > rsyslog also "03:00:22" to receive a TERM appears to have stopped. > > Regards, Yusuke Hello Yusuke, I see in your logs that /var/log/messages does not have any messages at all from pacemaker (not only at shutdown). Meanwhile, pacemaker.log does have messages, including from shutdown. The cluster writes directly to this file, without going through rsyslog. Both logs end at 03:00:22. I'm guessing that systemd waited until pacemaker exited, then stopped rsyslog. I don't think they're exiting at the same time (or that rsyslog exited before pacemaker). Instead, I suspect that either pacemaker is configured not to log via syslog, or rsyslog is configured not to send pacemaker logs to /var/log/messages. Check the value of PCMK_logfacility and PCMK_logpriority in your /etc/sysconfig/pacemaker. By default, pacemaker will log via syslog, but if these are explicitly set, they might be turning that off. Also, pacemaker will try to inherit log options from corosync, so check /etc/corosync/corosync.conf for logging options, especially to_syslog and syslog_facility. If that isn't an issue, then I would check /etc/rsyslog.conf and /etc/rsyslog.d/* to see if they do anything nonstandard. >> -Original Message- >> From: Ken Gaillot [mailto:kgail...@redhat.com] >> Sent: Friday, January 29, 2016 12:16 AM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] "After = syslog.service" it is not working? >> >> On 01/28/2016 12:48 AM, 飯田 雄介 wrote: >>> Hi, All >>> >>> I am building a cluster in the following environments. >>> RHEL7.2 >>> Pacemaker-1.1.14 >>> >>> The OS while it is running the Pacemaker was allowed to shutdown. >>> Logs at this time Pacemaker in the stop was not output to the syslog. >>> >>> This "After = syslog.service" does not work is set to start-up script, it >> seems pacemaker and rsyslog is stopped at the same time. >>> >>> Because I think it's rsyslog.service In RHEL7, whether this setting should >> not be the "After = rsyslog.service"? >>> >>> Regards, Yusuke >> >> The "After = syslog.service" line neither helps nor hurts, and we should just >> take it out. >> >> For a long time (and certainly in RHEL 7's systemd version 219), systemd >> automatically orders the system log to start before and stop after other >> services, >> so I don't think that's the cause of your problem. >> >> I'm not sure what would cause that behavior; can you post the messages that >> are logged once shutdown is initiated? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fwd: dlm not starting
> am configuring shared storage for 2 nodes (Cent 7) installed > pcs/gfs2-utils/lvm2-cluster when creating resource unable to start dlm > > crm_verify -LV >error: unpack_rsc_op:Preventing dlm from re-starting anywhere: > operation start failed 'not configured' (6) Are you using the ocf:pacemaker:controld resource agent for dlm? Normally it logs what the problem is when returning 'not configured', but I don't see it below. As far as I know, it will return 'not configured' if stonith-enabled=false or globally-unique=true, as those are incompatible with DLM. There is also a rare cluster error condition that will be reported as 'not configured', but it will always be accompanied by "Invalid resource definition" in the logs. > > Feb 05 13:34:26 [24262] libcompute1pengine: info: > determine_online_status: Node libcompute1 is online > Feb 05 13:34:26 [24262] libcompute1pengine: info: > determine_online_status: Node libcompute2 is online > Feb 05 13:34:26 [24262] libcompute1pengine: warning: > unpack_rsc_op_failure:Processing failed op start for dlm on > libcompute1: not configured (6) > Feb 05 13:34:26 [24262] libcompute1pengine:error: unpack_rsc_op: > Preventing dlm from re-starting anywhere: operation start failed 'not > configured' (6) > Feb 05 13:34:26 [24262] libcompute1pengine: warning: > unpack_rsc_op_failure:Processing failed op start for dlm on > libcompute1: not configured (6) > Feb 05 13:34:26 [24262] libcompute1pengine:error: unpack_rsc_op: > Preventing dlm from re-starting anywhere: operation start failed 'not > configured' (6) > Feb 05 13:34:26 [24262] libcompute1pengine: info: native_print: dlm > (ocf::pacemaker:controld): FAILED libcompute1 > Feb 05 13:34:26 [24262] libcompute1pengine: info: > get_failcount_full: dlm has failed INFINITY times on libcompute1 > Feb 05 13:34:26 [24262] libcompute1pengine: warning: > common_apply_stickiness: Forcing dlm away from libcompute1 after > 100 failures (max=100) > Feb 05 13:34:26 [24262] libcompute1pengine: info: native_color: > Resource dlm cannot run anywhere > Feb 05 13:34:26 [24262] libcompute1pengine: notice: LogActions: > Stopdlm (libcompute1) > Feb 05 13:34:26 [24262] libcompute1pengine: notice: > process_pe_message: Calculated Transition 59: > /var/lib/pacemaker/pengine/pe-input-176.bz2 > Feb 05 13:34:26 [24263] libcompute1 crmd: info: > do_state_transition: State transition S_POLICY_ENGINE -> > S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE > origin=handle_response ] > Feb 05 13:34:26 [24263] libcompute1 crmd: info: do_te_invoke: > Processing graph 59 (ref=pe_calc-dc-1454697266-177) derived from > /var/lib/pacemaker/pengine/pe-input-176.bz2 ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [OCF] Pacemaker reports a multi-state clone resource instance as running while it is not in fact
On 04.02.2016 15:43, Bogdan Dobrelya wrote: > Hello. > Regarding the original issue, good news are the resource-agents > ocf-shellfuncs is no more causing fork bombs to the dummy OCF RA [0] > after the fix [1] done. The bad news are that "self-forking" monitors > issue seems remaining for the rabbitmq OCF RA [2], and I can reproduce > it for another custom agent [3], so I'd guess it may be a valid for > another ones as well. > > IIUC, the issue seems related to how lrmd's forking monitor actions. > I tried to debug both pacemaker 1.1.10, 1.1.12 with gdb as the following: > > # cat ./cmds > set follow-fork-mode child > set detach-on-fork off > set follow-exec-mode new > catch fork > catch vfork > cont > # gdb -x cmds /usr/lib/pacemaker/lrmd `pgrep lrmd` > > I can confirm it catches forked monitors and makes nested forks as well. > But I have *many* debug symbols missing, bt is full of question marks > and, honestly, I'm not a gdb guru and do not now that to check in for > reproduced cases. > > So any help with how to troubleshooting things further are very appreciated! I figured out this is expected behaviour. There are no fork bombs left, but usual fork & exec syscalls each time the OCF RA is calling a shell command or ocf_run, ocf_log functions. And those false "self-forks" are nothing more but a transient state between the fork and exec calls, when the caption of the child process has yet to be updated... So I believe the problem was solved by the aforementioned patch completely. > > [0] https://github.com/bogdando/dummy-ocf-ra > [1] https://github.com/ClusterLabs/resource-agents/issues/734 > [2] > https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf > [3] > https://git.openstack.org/cgit/openstack/fuel-library/tree/files/fuel-ha-utils/ocf/ns_vrouter > > On 04.01.2016 17:33, Bogdan Dobrelya wrote: >> On 04.01.2016 17:14, Dejan Muhamedagic wrote: >>> Hi, >>> >>> On Mon, Jan 04, 2016 at 04:52:43PM +0100, Bogdan Dobrelya wrote: On 04.01.2016 16:36, Ken Gaillot wrote: > On 01/04/2016 09:25 AM, Bogdan Dobrelya wrote: >> On 04.01.2016 15:50, Bogdan Dobrelya wrote: >>> [...] >> Also note, that lrmd spawns *many* monitors like: >> root 6495 0.0 0.0 70268 1456 ?Ss2015 4:56 \_ >> /usr/lib/pacemaker/lrmd >> root 31815 0.0 0.0 4440 780 ?S15:08 0:00 | \_ >> /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor >> root 31908 0.0 0.0 4440 388 ?S15:08 0:00 | >> \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor >> root 31910 0.0 0.0 4440 384 ?S15:08 0:00 | >> \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor >> root 31915 0.0 0.0 4440 392 ?S15:08 0:00 | >> \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor >> ... > > At first glance, that looks like your monitor action is calling itself > recursively, but I don't see how in your code. Yes, it should be a bug in the ocf-shellfuncs's ocf_log(). >>> >>> If you're sure about that, please open an issue at >>> https://github.com/ClusterLabs/resource-agents/issues >> >> Submitted [0]. Thank you! >> Note, that it seems the very import action causes the issue, not the >> ocf_run or ocf_log code itself. >> >> [0] https://github.com/ClusterLabs/resource-agents/issues/734 >> >>> >>> Thanks, >>> >>> Dejan >>> >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> > > -- Best regards, Bogdan Dobrelya, Irc #bogdando ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org