Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-29 Thread Ken Gaillot
On 09/28/2016 10:54 PM, Andrew Beekhof wrote: > On Sat, Sep 24, 2016 at 9:12 AM, Ken Gaillot wrote: >>> "Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures >>> then migrate", but I can't think of a real-world situation where that >>> makes

Re: [ClusterLabs] Failed to retrieve meta-data for custom ocf resource

2016-09-29 Thread Jan Pokorný
Hello , On 29/09/16 12:41 -0400, Christopher Harvey wrote: > I think something is failing at the execvp() level. I'm seeing > useful looking trace logs in the code, but can't enable them right > now. I have: > PCMK_debug=yes > PCMK_logfile=/tmp/pacemaker.log > PCMK_logpriority=debug >

Re: [ClusterLabs] Pacemaker quorum behavior

2016-09-29 Thread Jan Pokorný
On 28/09/16 16:30 -0400, Scott Greenlese wrote: > Also, I have tried simulating a failed cluster node (to trigger a > STONITH action) by killing the corosync daemon on one node, but all > that does is respawn the daemon ... causing a temporary / transient > failure condition, and no fence takes

Re: [ClusterLabs] Failed to retrieve meta-data for custom ocf resource

2016-09-29 Thread Christopher Harvey
On Thu, Sep 29, 2016, at 12:20 PM, Jan Pokorný wrote: > On 28/09/16 16:55 -0500, Ken Gaillot wrote: > > On 09/28/2016 04:04 PM, Christopher Harvey wrote: > >> My corosync/pacemaker logs are seeing a bunch of messages like the > >> following: > >> > >> Sep 22 14:50:36 [1346] node-132-60

Re: [ClusterLabs] Failed to retrieve meta-data for custom ocf resource

2016-09-29 Thread Jan Pokorný
On 28/09/16 16:55 -0500, Ken Gaillot wrote: > On 09/28/2016 04:04 PM, Christopher Harvey wrote: >> My corosync/pacemaker logs are seeing a bunch of messages like the >> following: >> >> Sep 22 14:50:36 [1346] node-132-60 crmd: info: >> action_synced_wait: Managed

Re: [ClusterLabs] Pacemaker quorum behavior

2016-09-29 Thread Tomas Jelinek
Dne 29.9.2016 v 00:14 Ken Gaillot napsal(a): On 09/28/2016 03:57 PM, Scott Greenlese wrote: A quick addendum... After sending this post, I decided to stop pacemaker on the single, Online node in the cluster, and this effectively killed the corosync daemon: [root@zs93kl VD]# date;pcs cluster