Re: [ClusterLabs] Fwd: dlm not starting

2016-02-05 Thread Jan Pokorný
On 05/02/16 14:22 -0500, nameless wrote:
> Hi

Hello >nameless<,

technical aspect aside, it goes without saying that engaging in
a community assumes some level of cultural and social compatibility.
Otherwise there is a danger the cluster will partition, and
that would certainly be unhelpful.

Maybe this is a misunderstanding on my side, but so far, you don't
appear compatible, maturity-wise.

Happy Friday, !

-- 
Jan (Poki)


pgpuDRk12fVfi.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Two bugs in fence_ec2 script

2016-02-05 Thread Steve Marshall
I've found two bugs in the fence_ec2 script found in this RPM:
fence_ec2-0.1-0.10.1.x86_64.rpm

I believe this is the latest and greatest, but I may be wrong.  I'm not
sure where to post these fixes, so I'm starting here.  If there is a better
place to post, please let me know.

The errors are:
1. The tag attribute does not default to "Name", as specified in the
documentation
2. The script does not accept attribute settings from stdin

#1 is not critical, since tag=Name can be specified explicitly when setting
up a stonith resource.  However, #2 is very important, since stonith_admin
(and I think most of pacemaker) passes arguments to fencing scripts via
stdin.  Without this fix, fence_ec2 will not work properly via pacemaker,
that is:

This command works
fence_ec2 --action=metadata

...but this alternate version of same command does not:
echo "action=metadata" | fence_ec2

The fixes are relatively trivial.  The version of fence_ec2 from the RPM is
attached as fence_ec2.old.  My modified version is attached as
fence_ec2.new.  I've also attached the RPM that was the source for
fence_ec2.old.


fence_ec2-0.1-0.10.1.x86_64.rpm
Description: application/rpm


fence_ec2.old
Description: Binary data


fence_ec2.new
Description: Binary data
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] "After = syslog.service" it is not working?

2016-02-05 Thread Ken Gaillot
On 01/28/2016 10:27 PM, 飯田 雄介 wrote:
> Hi, Ken
> 
> I have to get the log.
> shutdown the -r now ran to "03:00:22".
> rsyslog also "03:00:22" to receive a TERM appears to have stopped.
> 
> Regards, Yusuke

Hello Yusuke,

I see in your logs that /var/log/messages does not have any messages at
all from pacemaker (not only at shutdown).

Meanwhile, pacemaker.log does have messages, including from shutdown.
The cluster writes directly to this file, without going through rsyslog.

Both logs end at 03:00:22. I'm guessing that systemd waited until
pacemaker exited, then stopped rsyslog. I don't think they're exiting at
the same time (or that rsyslog exited before pacemaker).

Instead, I suspect that either pacemaker is configured not to log via
syslog, or rsyslog is configured not to send pacemaker logs to
/var/log/messages.

Check the value of PCMK_logfacility and PCMK_logpriority in your
/etc/sysconfig/pacemaker. By default, pacemaker will log via syslog, but
if these are explicitly set, they might be turning that off.

Also, pacemaker will try to inherit log options from corosync, so check
/etc/corosync/corosync.conf for logging options, especially to_syslog
and syslog_facility.

If that isn't an issue, then I would check /etc/rsyslog.conf and
/etc/rsyslog.d/* to see if they do anything nonstandard.

>> -Original Message-
>> From: Ken Gaillot [mailto:kgail...@redhat.com]
>> Sent: Friday, January 29, 2016 12:16 AM
>> To: users@clusterlabs.org
>> Subject: Re: [ClusterLabs] "After = syslog.service" it is not working?
>>
>> On 01/28/2016 12:48 AM, 飯田 雄介 wrote:
>>> Hi, All
>>>
>>> I am building a cluster in the following environments.
>>> RHEL7.2
>>> Pacemaker-1.1.14
>>>
>>> The OS while it is running the Pacemaker was allowed to shutdown.
>>> Logs at this time Pacemaker in the stop was not output to the syslog.
>>>
>>> This "After = syslog.service" does not work is set to start-up script, it
>> seems pacemaker and rsyslog is stopped at the same time.
>>>
>>> Because I think it's rsyslog.service In RHEL7, whether this setting should
>> not be the "After = rsyslog.service"?
>>>
>>> Regards, Yusuke
>>
>> The "After = syslog.service" line neither helps nor hurts, and we should just
>> take it out.
>>
>> For a long time (and certainly in RHEL 7's systemd version 219), systemd
>> automatically orders the system log to start before and stop after other 
>> services,
>> so I don't think that's the cause of your problem.
>>
>> I'm not sure what would cause that behavior; can you post the messages that
>> are logged once shutdown is initiated?


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fwd: dlm not starting

2016-02-05 Thread Ken Gaillot
> am configuring shared storage for 2 nodes (Cent 7) installed
> pcs/gfs2-utils/lvm2-cluster when creating resource unable to start dlm
> 
>  crm_verify -LV
>error: unpack_rsc_op:Preventing dlm from re-starting anywhere:
> operation start failed 'not configured' (6)

Are you using the ocf:pacemaker:controld resource agent for dlm?
Normally it logs what the problem is when returning 'not configured',
but I don't see it below. As far as I know, it will return 'not
configured' if stonith-enabled=false or globally-unique=true, as those
are incompatible with DLM.

There is also a rare cluster error condition that will be reported as
'not configured', but it will always be accompanied by "Invalid resource
definition" in the logs.

> 
> Feb 05 13:34:26 [24262] libcompute1pengine: info:
> determine_online_status:  Node libcompute1 is online
> Feb 05 13:34:26 [24262] libcompute1pengine: info:
> determine_online_status:  Node libcompute2 is online
> Feb 05 13:34:26 [24262] libcompute1pengine:  warning:
> unpack_rsc_op_failure:Processing failed op start for dlm on
> libcompute1: not configured (6)
> Feb 05 13:34:26 [24262] libcompute1pengine:error: unpack_rsc_op:
>  Preventing dlm from re-starting anywhere: operation start failed 'not
> configured' (6)
> Feb 05 13:34:26 [24262] libcompute1pengine:  warning:
> unpack_rsc_op_failure:Processing failed op start for dlm on
> libcompute1: not configured (6)
> Feb 05 13:34:26 [24262] libcompute1pengine:error: unpack_rsc_op:
>  Preventing dlm from re-starting anywhere: operation start failed 'not
> configured' (6)
> Feb 05 13:34:26 [24262] libcompute1pengine: info: native_print: dlm
> (ocf::pacemaker:controld):  FAILED libcompute1
> Feb 05 13:34:26 [24262] libcompute1pengine: info:
> get_failcount_full:   dlm has failed INFINITY times on libcompute1
> Feb 05 13:34:26 [24262] libcompute1pengine:  warning:
> common_apply_stickiness:  Forcing dlm away from libcompute1 after
> 100 failures (max=100)
> Feb 05 13:34:26 [24262] libcompute1pengine: info: native_color:
> Resource dlm cannot run anywhere
> Feb 05 13:34:26 [24262] libcompute1pengine:   notice: LogActions:
> Stopdlm (libcompute1)
> Feb 05 13:34:26 [24262] libcompute1pengine:   notice:
> process_pe_message:   Calculated Transition 59:
> /var/lib/pacemaker/pengine/pe-input-176.bz2
> Feb 05 13:34:26 [24263] libcompute1   crmd: info:
> do_state_transition:  State transition S_POLICY_ENGINE ->
> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> origin=handle_response ]
> Feb 05 13:34:26 [24263] libcompute1   crmd: info: do_te_invoke:
> Processing graph 59 (ref=pe_calc-dc-1454697266-177) derived from
> /var/lib/pacemaker/pengine/pe-input-176.bz2


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [OCF] Pacemaker reports a multi-state clone resource instance as running while it is not in fact

2016-02-05 Thread Bogdan Dobrelya
On 04.02.2016 15:43, Bogdan Dobrelya wrote:
> Hello.
> Regarding the original issue, good news are the resource-agents
> ocf-shellfuncs is no more causing fork bombs to the dummy OCF RA [0]
> after the fix [1] done. The bad news are that "self-forking" monitors
> issue seems remaining for the rabbitmq OCF RA [2], and I can reproduce
> it for another custom agent [3], so I'd guess it may be a valid for
> another ones as well.
> 
> IIUC, the issue seems related to how lrmd's forking monitor actions.
> I tried to debug both pacemaker 1.1.10, 1.1.12 with gdb as the following:
> 
> # cat ./cmds
> set follow-fork-mode child
> set detach-on-fork off
> set follow-exec-mode new
> catch fork
> catch vfork
> cont
> # gdb -x cmds /usr/lib/pacemaker/lrmd `pgrep lrmd`
> 
> I can confirm it catches forked monitors and makes nested forks as well.
> But I have *many* debug symbols missing, bt is full of question marks
> and, honestly, I'm not a gdb guru and do not now that to check in for
> reproduced cases.
> 
> So any help with how to troubleshooting things further are very appreciated!

I figured out this is expected behaviour. There are no fork bombs left,
but usual fork & exec syscalls each time the OCF RA is calling a shell
command or ocf_run, ocf_log functions. And those false "self-forks" are
nothing more but a transient state between the fork and exec calls, when
the caption of the child process has yet to be updated... So I believe
the problem was solved by the aforementioned patch completely.

> 
> [0] https://github.com/bogdando/dummy-ocf-ra
> [1] https://github.com/ClusterLabs/resource-agents/issues/734
> [2]
> https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf
> [3]
> https://git.openstack.org/cgit/openstack/fuel-library/tree/files/fuel-ha-utils/ocf/ns_vrouter
> 
> On 04.01.2016 17:33, Bogdan Dobrelya wrote:
>> On 04.01.2016 17:14, Dejan Muhamedagic wrote:
>>> Hi,
>>>
>>> On Mon, Jan 04, 2016 at 04:52:43PM +0100, Bogdan Dobrelya wrote:
 On 04.01.2016 16:36, Ken Gaillot wrote:
> On 01/04/2016 09:25 AM, Bogdan Dobrelya wrote:
>> On 04.01.2016 15:50, Bogdan Dobrelya wrote:
>>> [...]
>> Also note, that lrmd spawns *many* monitors like:
>> root  6495  0.0  0.0  70268  1456 ?Ss2015   4:56  \_
>> /usr/lib/pacemaker/lrmd
>> root 31815  0.0  0.0   4440   780 ?S15:08   0:00  |   \_
>> /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor
>> root 31908  0.0  0.0   4440   388 ?S15:08   0:00  |
>>   \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor
>> root 31910  0.0  0.0   4440   384 ?S15:08   0:00  |
>>   \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor
>> root 31915  0.0  0.0   4440   392 ?S15:08   0:00  |
>>   \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor
>> ...
>
> At first glance, that looks like your monitor action is calling itself
> recursively, but I don't see how in your code.

 Yes, it should be a bug in the ocf-shellfuncs's ocf_log().
>>>
>>> If you're sure about that, please open an issue at
>>> https://github.com/ClusterLabs/resource-agents/issues
>>
>> Submitted [0]. Thank you!
>> Note, that it seems the very import action causes the issue, not the
>> ocf_run or ocf_log code itself.
>>
>> [0] https://github.com/ClusterLabs/resource-agents/issues/734
>>
>>>
>>> Thanks,
>>>
>>> Dejan
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
> 
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org