[ClusterLabs] [Announce] clufter v0.59.2 released

2016-07-28 Thread Jan Pokorný
Hello ,

I am happy to announce that clufter, a tool/library for transforming
and analyzing cluster configuration formats, got its version 0.59.1
tagged and released (incl. signature using my 60BCBB4F5CD7F9EF key):


or alternative (original) location:



The test suite is the same as for 0.59.1 as nothing changed there:

or alternatively:


Changelog highlights for v0.59.2 (also available as a tag message):

- release enriching "pcs commands" output with some context of use
- feature extensions:
  . *2pcscmd commands now first emit a comment block containing key
pieces of information about the run, such as a current date,
library version, the overall command that was executed, and
importantly (more and more), the target system specification
(this utilizes a new, dedicated cmd-annotate filter)
- internal enhancements:
  . so far, all formats used to represent concrete information
representable in various pertaining forms;  generator type
of filters (such as mentioned cmd-annotate) imposed the
existence of a special "empty" format (analogous to "void"
in C) for generators to map from into something useful,
so this release introduces "Nothing" format and makes sure
it's generally usable throughout the internals just as well

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic git archives preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,
.


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpbhSh4wZ8Cs.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Previous DC fenced prior to integration

2016-07-28 Thread Nate Clark
On Mon, Jul 25, 2016 at 2:48 PM, Nate Clark  wrote:
> On Mon, Jul 25, 2016 at 11:20 AM, Ken Gaillot  wrote:
>> On 07/23/2016 10:14 PM, Nate Clark wrote:
>>> On Sat, Jul 23, 2016 at 1:06 AM, Andrei Borzenkov  
>>> wrote:
 23.07.2016 01:37, Nate Clark пишет:
> Hello,
>
> I am running pacemaker 1.1.13 with corosync and think I may have
> encountered a start up timing issue on a two node cluster. I didn't
> notice anything in the changelog for 14 or 15 that looked similar to
> this or open bugs.
>
> The rough out line of what happened:
>
> Module 1 and 2 running
> Module 1 is DC
> Module 2 shuts down
> Module 1 updates node attributes used by resources
> Module 1 shuts down
> Module 2 starts up
> Module 2 votes itself as DC
> Module 1 starts up
> Module 2 sees module 1 in corosync and notices it has quorum
> Module 2 enters policy engine state.
> Module 2 policy engine decides to fence 1
> Module 2 then continues and starts resource on itself based upon the old 
> state
>
> For some reason the integration never occurred and module 2 starts to
> perform actions based on stale state.
>
> Here is the full logs
> Jul 20 16:29:06.376805 module-2 crmd[21969]:   notice: Connecting to
> cluster infrastructure: corosync
> Jul 20 16:29:06.386853 module-2 crmd[21969]:   notice: Could not
> obtain a node name for corosync nodeid 2
> Jul 20 16:29:06.392795 module-2 crmd[21969]:   notice: Defaulting to
> uname -n for the local corosync node name
> Jul 20 16:29:06.403611 module-2 crmd[21969]:   notice: Quorum lost
> Jul 20 16:29:06.409237 module-2 stonith-ng[21965]:   notice: Watching
> for stonith topology changes
> Jul 20 16:29:06.409474 module-2 stonith-ng[21965]:   notice: Added
> 'watchdog' to the device list (1 active devices)
> Jul 20 16:29:06.413589 module-2 stonith-ng[21965]:   notice: Relying
> on watchdog integration for fencing
> Jul 20 16:29:06.416905 module-2 cib[21964]:   notice: Defaulting to
> uname -n for the local corosync node name
> Jul 20 16:29:06.417044 module-2 crmd[21969]:   notice:
> pcmk_quorum_notification: Node module-2[2] - state is now member (was
> (null))
> Jul 20 16:29:06.421821 module-2 crmd[21969]:   notice: Defaulting to
> uname -n for the local corosync node name
> Jul 20 16:29:06.422121 module-2 crmd[21969]:   notice: Notifications 
> disabled
> Jul 20 16:29:06.422149 module-2 crmd[21969]:   notice: Watchdog
> enabled but stonith-watchdog-timeout is disabled
> Jul 20 16:29:06.422286 module-2 crmd[21969]:   notice: The local CRM
> is operational
> Jul 20 16:29:06.422312 module-2 crmd[21969]:   notice: State
> transition S_STARTING -> S_PENDING [ input=I_PENDING
> cause=C_FSA_INTERNAL origin=do_started ]
> Jul 20 16:29:07.416871 module-2 stonith-ng[21965]:   notice: Added
> 'fence_sbd' to the device list (2 active devices)
> Jul 20 16:29:08.418567 module-2 stonith-ng[21965]:   notice: Added
> 'ipmi-1' to the device list (3 active devices)
> Jul 20 16:29:27.423578 module-2 crmd[21969]:  warning: FSA: Input
> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jul 20 16:29:27.424298 module-2 crmd[21969]:   notice: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_TIMER_POPPED origin=election_timeout_popped ]
> Jul 20 16:29:27.460834 module-2 crmd[21969]:  warning: FSA: Input
> I_ELECTION_DC from do_election_check() received in state S_INTEGRATION
> Jul 20 16:29:27.463794 module-2 crmd[21969]:   notice: Notifications 
> disabled
> Jul 20 16:29:27.463824 module-2 crmd[21969]:   notice: Watchdog
> enabled but stonith-watchdog-timeout is disabled
> Jul 20 16:29:27.473285 module-2 attrd[21967]:   notice: Defaulting to
> uname -n for the local corosync node name
> Jul 20 16:29:27.498464 module-2 pengine[21968]:   notice: Relying on
> watchdog integration for fencing
> Jul 20 16:29:27.498536 module-2 pengine[21968]:   notice: We do not
> have quorum - fencing and resource management disabled
> Jul 20 16:29:27.502272 module-2 pengine[21968]:  warning: Node
> module-1 is unclean!
> Jul 20 16:29:27.502287 module-2 pengine[21968]:   notice: Cannot fence
> unclean nodes until quorum is attained (or no-quorum-policy is set to
> ignore)
>>
>> The above two messages indicate that module-2 cannot see module-1 at
>> startup, therefore it must assume it is potentially misbehaving and must
>> be shot. However, since it does not have quorum with only one out of two
>> nodes, it must wait until module-1 joins until it can shoot it!
>>
>> This is a special problem with quorum in a two-node cluster. There are a
>> variety of ways to deal with it, but the simplest is to set "two_node:
>> 1" in corosync.conf (with corosync 2 or later). This will