Re: [Pacemaker] [Linux-HA] Announcing the Heartbeat 3.0.6 Release

2015-02-10 Thread Nikita Michalko

On 10.02.2015 22:24, Lars Ellenberg wrote:

TL;DR:

   If you intend to set up a new High Availability cluster
   using the Pacemaker cluster manager,
   you typically should not care for Heartbeat,
   but use recent releases (2.3.x) of Corosync.

   If you don't care for Heartbeat, don't read further.

Unless you are beekhof... there's a question below ;-)



After 3½ years since the last "officially tagged" release of Heartbeat,
I have seen the need to do a new "maintenance release".

   The Heartbeat 3.0.6 release tag: 3d59540cf28d
   and the change set it points to: cceeb47a7d8f
GREAT !!!   Thank you very much, Lars! Heartbeat is still running on 
some our production clusters ...



The main reason for this was that pacemaker more recent than
somewhere between 1.1.6 and 1.1.7 would no longer work properly
on the Heartbeat cluster stack.

Because some of the daemons have moved from "glue" to "pacemaker" proper,
and changed their paths. This has been fixed in Heartbeat.

And because during that time, stonith-ng was refactored, and would still
reliably fence, but not understand its own confirmation message, so it
was effectively broken. This I fixed in pacemaker.



If you chose to run new Pacemaker with the Heartbeat communication stack,
it should be at least 1.1.12 with a few patches,
see my December 2014 commits at the top of
https://github.com/lge/pacemaker/commits/linbit-cluster-stack-pcmk-1.1.12
I'm not sure if they got into pacemaker upstream yet.

beekhof?
Do I need to rebase?
Or did I miss you merging these?

---

If you have those patches,
consider setting this new ha.cf configuration parameter:

# If pacemaker crmd spawns the pengine itself,
# it sometimes "forgets" to kill the pengine on shutdown,
# which later may confuse the system after cluster restart.
# Tell the system that Heartbeat is supposed to
# control the pengine directly.
crmd_spawns_pengine off



Here is the shortened Heartbeat changelog,
the longer version is available in mercurial:
http://hg.linux-ha.org/heartbeat-STABLE_3_0/shortlog

- fix emergency shutdown due to broken update_ackseq
- fix node dead detection problems
- fix converging of membership (ccm)
- fix init script startup glitch (caused by changes in glue/resource-agents)
- heartbeat.service file for systemd platforms
- new ucast6 UDP IPv6 communication plugin
- package ha_api.py in standard package
- update some man pages, specifically the example ha.cf
- also report ccm membership status for cl_status hbstatus -v
- updated some log messages, or their log levels
- reduce max_delay in broadcast client_status query to one second
- apply various (mostly cosmetic) patches from Debian
- drop HBcompress compression plugins: they are part of cluster glue
- drop "openais" HBcomm plugin
- better support for current pacemaker versions
- try to not miss a SIGTERM (fix problem with very fast respawn/stop cycle)
- dopd: ignore dead ping nodes
- cl_status improvements
- api internals: reduce IPC round-trips to get at status information
- uid=root is sufficient to use heartbeat api (gid=haclient remains sufficient)
- fix /dev/null as log- or debugfile setting
- move daemon binaries into libexecdir
- document movement of compression plugins into cluster-glue
- fix usage of SO_REUSEPORT in ucast sockets
- fix compile issues with recent gcc and -Werror

Note that a number of the mentioned "fixes" have been created two years
ago already, and may have been released in packages for a long time,
where vendors have chosen to package them.



As to future plans for Heartbeat:

Heartbeat is still useful for non-pacemaker, "haresources"-mode clusters.

We (Linbit) will maintain Heartbeat for the foreseeable future.
That should not be too much of a burden, as it is "stable",
and due to long years of field exposure, "all bugs are known" ;-)

The most notable shortcoming when using Heartbeat with Pacemaker
clusters would be the limited message size.
There are currently no plans to remove that limitation.

With its wide choice of communications paths, even "exotic"
communication plugins, and the ability to run "arbitrarily many"
paths, some deployments may even favor it over Corosync still.

But typically, for new deployments involving Pacemaker,
in most cases you should chose Corosync 2.3.x
as your membership and communication layer.

For existing deployments using Heartbeat,
upgrading to this Heartbeat version is strongly recommended.

Thanks,

Lars Ellenberg



___
Linux-HA mailing list
linux...@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProble

Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-02-10 Thread Andrei Borzenkov
В Tue, 10 Feb 2015 15:58:57 +0100
Dejan Muhamedagic  пишет:

> On Mon, Feb 09, 2015 at 04:41:19PM +0100, Lars Ellenberg wrote:
> > On Fri, Feb 06, 2015 at 04:15:44PM +0100, Dejan Muhamedagic wrote:
> > > Hi,
> > > 
> > > On Thu, Feb 05, 2015 at 09:18:50AM +0100, Digimer wrote:
> > > > That is the problem that makes geo-clustering very hard to nearly
> > > > impossible. You can look at the Booth option for pacemaker, but that
> > > > requires two (or more) full clusters, plus an arbitrator 3rd
> > > 
> > > A full cluster can consist of one node only. Hence, it is
> > > possible to have a kind of stretch two-node [multi-site] cluster
> > > based on tickets and managed by booth.
> > 
> > In theory.
> > 
> > In practice, we rely on "proper behaviour" of "the other site",
> > in case a ticket is revoked, or cannot be renewed.
> > 
> > Relying on a single node for "proper behaviour" does not inspire
> > as much confidence as relying on a multi-node HA-cluster at each site,
> > which we can expect to ensure internal fencing.
> > 
> > With reliable hardware watchdogs, it still should be ok to do
> > "stretched two node HA clusters" in a reliable way.
> > 
> > Be generous with timeouts.
> 
> As always.
> 
> > And document which failure modes you expect to handle,
> > and how to deal with the worst-case scenarios if you end up with some
> > failure case that you are not equipped to handle properly.
> > 
> > There are deployments which favor
> > "rather online with _potential_ split brain" over
> > "rather offline just in case".
> 
> There's an arbitrator which should help in case of split brain.
> 

You can never really differentiate between site down and site cut off
due to (network) infrastructure outage. Arbitrator can mitigate split
brain only to the extent you trust your network. You still have to take
decision what you value more - data availability or data consistency.

Long distance clusters are really for disaster recovery. It is
convenient to have a single button that starts up all resources in
controlled manner, but someone really need to decide to push this
button.

> > Document this, print it out on paper,
> > 
> >"I am aware that this may lead to lost transactions,
> >data divergence, data corruption, or data loss.
> >I am personally willing to take the blame,
> >and live with the consequences."
> > 
> > Have some "boss" sign that ^^^
> > in the real world using a real pen.
> 
> Well, of course running such a "stretch" cluster would be
> rather different from a "normal" one.
> 
> The essential thing is that there's no fencing, unless configured
> as a dead-man switch for the ticket. Given that booth has a
> "sanity" program hook, maybe that could be utilized to verify if
> this side of the cluster is healthy enough.
> 
> Thanks,
> 
> Dejan
> 
> > Lars
> > 
> > -- 
> > : Lars Ellenberg
> > : http://www.LINBIT.com | Your Way to High Availability
> > : DRBD, Linux-HA  and  Pacemaker support and consulting
> > 
> > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> > 
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Active/Active

2015-02-10 Thread emmanuel segura
try to change your controld daemon

OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/pacemaker/controld meta-data



The daemon to start - supports gfs_controld(.pcmk) and dlm_controld(.pcmk)

The daemon to start



and remember you need to configure the cluster fencing, because dlm relay on it

2015-02-10 23:08 GMT+01:00 José Luis Rodríguez Rodríguez :
> Hi Emmanuel, I installed this package but the result is the same when I try
> to mount /dev/drbd1 on /mn:
> gfs_controld join connect error: Connection refused
> error mounting lockproto lock_dlm
>
>
> I have installed gfs2-tools,  dlm-pcmk y el que me indicastes gfs-pcmk
>
>
>
> My pacemaker configuration is:
> node nodo1
> node nodo2
> primitive FAILOVER-ADDR ocf:heartbeat:IPaddr2 \
> params ip="192.168.122.100" nic="eth0" \
> op monitor interval="10s" meta-is-managed="true" \
> meta target-role="Started"
> primitive WebData ocf:linbit:drbd \
> params drbd_resource="wwwdata" \
> op monitor interval="60s"
> primitive WebFS ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/wwwdata" directory="/var/www"
> fstype="ext4" \
> meta target-role="Stopped"
> primitive WebSite ocf:heartbeat:apache \
> params configfile="/etc/apache2/apache2.conf"
> statusurl="http://localhost/server-status"; \
> op monitor interval="1min" \
> meta target-role="Started"
> primitive dlm ocf:pacemaker:controld \
> op monitor interval="60s"
> ms WebDataClone WebData \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Started"
> clone dlm_clone dlm \
> meta clone-max="2" clone-node-max="1" target-role="Started"
> location PREFERIDO-NODO1 WebSite 50: nodo1
> colocation WebSite-with-WebFS inf: WebSite WebFS
> colocation fs_on_drbd inf: WebFS WebDataClone:Master
> colocation website-with-ip inf: WebSite FAILOVER-ADDR
> order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
> order WebSite-after-WebFS inf: WebFS WebSite
> order apache-after-ip inf: FAILOVER-ADDR WebSite
> property $id="cib-bootstrap-options" \
> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
> op_defaults $id="op-options" \
> timeout="240s"
>
>
>
>
> Con sudo crm_mon:
>
>
> Online: [ nodo1 nodo2 ]
>
> FAILOVER-ADDR   (ocf::heartbeat:IPaddr2):   Started nodo1
>  Master/Slave Set: WebDataClone [WebData]
>  Masters: [ nodo1 ]
>  Slaves: [ nodo2 ]
>  Clone Set: dlm_clone [dlm]
>  Started: [ nodo2 nodo1 ]
>
>
> El estado de DRBD es:
>
> Nodo 1
>  1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
> ns:264908 nr:0 dw:0 dr:267236 al:0 bm:19 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
> oos:0
>
>
> Nodo 2
> 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
> ns:0 nr:264908 dw:264908 dr:0 al:0 bm:19 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
> oos:0
>
>
> What is my error?
>
>
> On 10 February 2015 at 16:02, emmanuel segura  wrote:
>>
>> I'm using debian 7
>>
>> apt-cache show gfs-pcmk
>> ..
>> This package contains the GFS module for pacemaker.
>> ...
>>
>> 2015-02-10 8:55 GMT+01:00 José Luis Rodríguez Rodríguez
>> :
>> > Hello,
>> >
>> > I would like to create an active/active cluster by using pacemaker and
>> > corosync on Debian. I  have followed the documentation
>> > http://clusterlabs.org/doc/Cluster_from_Scratch.pdf.  It works well
>> > until
>> > 8.2.2 Create and Populate an GFS2 Partition.  When I try to mount the
>> > disk
>> > /dev/drbd1 as /mnt, the output is:
>> >
>> > gfs_controld join connect error: Connection refused
>> > error mounting lockproto lock_dlm
>> >
>> > I have read that it is necessary to use cman, but then the resources
>> > created
>> > by pacemaker (with the command crm configure primitive ...) doesn't
>> > appear
>> > with the command crm_mon.
>> >
>> > What could I do?
>> >
>> > --
>> > Saludos,
>> >
>> >
>> > José Luis
>> > --
>> > Profesor Informática IES Jacarandá -  Brenes (Sevilla)
>> > http://www.iesjacaranda.es  -   www.iesjacaranda-brenes.org
>> > twitter: @jlrod2
>> >
>> >
>> >
>> > ___
>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>>
>>
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clust

Re: [Pacemaker] Version of libqb is too old: v0.13 or greater requried

2015-02-10 Thread Alexis de BRUYN

On 01/29/15 09:28, Thomas Manninger wrote:

Hi,

Hi David, Hi Thomas,

Thanks for your help.


create with checkinstall an debian package of libqb0, then it should be
work.
Thomas, I have created the Debian package with checkinstall, and 
completed the install of Pacemaker. Thanks again.


Alexis.


Regards
*Gesendet:* Mittwoch, 28. Januar 2015 um 19:18 Uhr
*Von:* "Alexis de BRUYN" 
*An:* pacema...@clusterlabs.org
*Betreff:* [Pacemaker] Version of libqb is too old: v0.13 or greater
requried
Hi Everybody,

I have compiled libqb 0.17.1 under Debian Jessie/testing amd64 as:

tar zxvf libqb-v0.17.1.tar.gz
cd libqb-0.17.1/
./autogen.sh
./configure
make -j8
make -j8 install

Then after succesful builds of COROSYNC 2.3.4, CLUSTER-GLUE 1.0.12 and
RESOURCE-AGENTS 3.9.5, compiling PACEMAKER 1.1.12 fails with:

unzip Pacemaker-1.1.12.zip
cd pacemaker-Pacemaker-1.1.12/
addgroup --system haclient
./autogen.sh
./configure
[...]
configure: error: in `/home/alexis/pacemaker-Pacemaker-1.1.12':
configure: error: Version of libqb is too old: v0.13 or greater requried

I have tried to pass some flags to ./configure, but I still get this error.

What am I doing wrong?

Thanks for your help,

--
Alexis de BRUYN 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--
Alexis de BRUYN

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Active/Active

2015-02-10 Thread José Luis Rodríguez Rodríguez
Hi Emmanuel, I installed this package but the result is the same when I try
to mount /dev/drbd1 on /mn:
gfs_controld join connect error: Connection refused
error mounting lockproto lock_dlm


I have installed *gfs2-tools*,  *dlm-pcmk *y el que me indicastes* gfs-pcmk*




My *pacemaker configuration* is:
node nodo1
node nodo2
primitive FAILOVER-ADDR ocf:heartbeat:IPaddr2 \
params ip="192.168.122.100" nic="eth0" \
op monitor interval="10s" meta-is-managed="true" \
meta target-role="Started"
primitive WebData ocf:linbit:drbd \
params drbd_resource="wwwdata" \
op monitor interval="60s"
primitive WebFS ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/wwwdata" directory="/var/www"
fstype="ext4" \
meta target-role="Stopped"
primitive WebSite ocf:heartbeat:apache \
params configfile="/etc/apache2/apache2.conf" statusurl="
http://localhost/server-status"; \
op monitor interval="1min" \
meta target-role="Started"
*primitive dlm ocf:pacemaker:controld \*
*op monitor interval="60s"*
ms WebDataClone WebData \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
*clone dlm_clone dlm \*
*meta clone-max="2" clone-node-max="1" target-role="Started"*
location PREFERIDO-NODO1 WebSite 50: nodo1
colocation WebSite-with-WebFS inf: WebSite WebFS
colocation fs_on_drbd inf: WebFS WebDataClone:Master
colocation website-with-ip inf: WebSite FAILOVER-ADDR
order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
order WebSite-after-WebFS inf: WebFS WebSite
order apache-after-ip inf: FAILOVER-ADDR WebSite
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
op_defaults $id="op-options" \
timeout="240s"




Con *sudo crm_mon*:


Online: [ nodo1 nodo2 ]

FAILOVER-ADDR   (ocf::heartbeat:IPaddr2):   Started nodo1
 Master/Slave Set: WebDataClone [WebData]
 Masters: [ nodo1 ]
 Slaves: [ nodo2 ]
 Clone Set: dlm_clone [dlm]
 Started: [ nodo2 nodo1 ]


El *estado de DRBD* es:

Nodo 1
 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
ns:264908 nr:0 dw:0 dr:267236 al:0 bm:19 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
oos:0


Nodo 2
1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
ns:0 nr:264908 dw:264908 dr:0 al:0 bm:19 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
oos:0


What is my error?


On 10 February 2015 at 16:02, emmanuel segura  wrote:

> I'm using debian 7
>
> apt-cache show gfs-pcmk
> ..
> This package contains the GFS module for pacemaker.
> ...
>
> 2015-02-10 8:55 GMT+01:00 José Luis Rodríguez Rodríguez  >:
> > Hello,
> >
> > I would like to create an active/active cluster by using pacemaker and
> > corosync on Debian. I  have followed the documentation
> > http://clusterlabs.org/doc/Cluster_from_Scratch.pdf.  It works well
> until
> > 8.2.2 Create and Populate an GFS2 Partition.  When I try to mount the
> disk
> > /dev/drbd1 as /mnt, the output is:
> >
> > gfs_controld join connect error: Connection refused
> > error mounting lockproto lock_dlm
> >
> > I have read that it is necessary to use cman, but then the resources
> created
> > by pacemaker (with the command crm configure primitive ...) doesn't
> appear
> > with the command crm_mon.
> >
> > What could I do?
> >
> > --
> > Saludos,
> >
> >
> > José Luis
> > --
> > Profesor Informática IES Jacarandá -  Brenes (Sevilla)
> > http://www.iesjacaranda.es  -   www.iesjacaranda-brenes.org
> > twitter: @jlrod2
> >
> >
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Saludos,


José Luis
--
Profesor Informática IES Jacarandá -  Brenes (Sevilla)
http://www.iesjacaranda.es  -   www.iesjacaranda-brenes.org
twitter: @jlrod2
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Announcing the Heartbeat 3.0.6 Release

2015-02-10 Thread Lars Ellenberg

TL;DR:

  If you intend to set up a new High Availability cluster
  using the Pacemaker cluster manager,
  you typically should not care for Heartbeat,
  but use recent releases (2.3.x) of Corosync.

  If you don't care for Heartbeat, don't read further.

Unless you are beekhof... there's a question below ;-)



After 3½ years since the last "officially tagged" release of Heartbeat,
I have seen the need to do a new "maintenance release".

  The Heartbeat 3.0.6 release tag: 3d59540cf28d
  and the change set it points to: cceeb47a7d8f

The main reason for this was that pacemaker more recent than
somewhere between 1.1.6 and 1.1.7 would no longer work properly
on the Heartbeat cluster stack.

Because some of the daemons have moved from "glue" to "pacemaker" proper,
and changed their paths. This has been fixed in Heartbeat.

And because during that time, stonith-ng was refactored, and would still
reliably fence, but not understand its own confirmation message, so it
was effectively broken. This I fixed in pacemaker.



If you chose to run new Pacemaker with the Heartbeat communication stack,
it should be at least 1.1.12 with a few patches,
see my December 2014 commits at the top of
https://github.com/lge/pacemaker/commits/linbit-cluster-stack-pcmk-1.1.12
I'm not sure if they got into pacemaker upstream yet.

beekhof?
Do I need to rebase?
Or did I miss you merging these?

---

If you have those patches,
consider setting this new ha.cf configuration parameter:

# If pacemaker crmd spawns the pengine itself,
# it sometimes "forgets" to kill the pengine on shutdown,
# which later may confuse the system after cluster restart.
# Tell the system that Heartbeat is supposed to
# control the pengine directly.
crmd_spawns_pengine off



Here is the shortened Heartbeat changelog,
the longer version is available in mercurial:
http://hg.linux-ha.org/heartbeat-STABLE_3_0/shortlog

- fix emergency shutdown due to broken update_ackseq
- fix node dead detection problems
- fix converging of membership (ccm)
- fix init script startup glitch (caused by changes in glue/resource-agents)
- heartbeat.service file for systemd platforms
- new ucast6 UDP IPv6 communication plugin
- package ha_api.py in standard package
- update some man pages, specifically the example ha.cf
- also report ccm membership status for cl_status hbstatus -v
- updated some log messages, or their log levels
- reduce max_delay in broadcast client_status query to one second
- apply various (mostly cosmetic) patches from Debian
- drop HBcompress compression plugins: they are part of cluster glue
- drop "openais" HBcomm plugin
- better support for current pacemaker versions
- try to not miss a SIGTERM (fix problem with very fast respawn/stop cycle)
- dopd: ignore dead ping nodes
- cl_status improvements
- api internals: reduce IPC round-trips to get at status information
- uid=root is sufficient to use heartbeat api (gid=haclient remains sufficient)
- fix /dev/null as log- or debugfile setting
- move daemon binaries into libexecdir
- document movement of compression plugins into cluster-glue
- fix usage of SO_REUSEPORT in ucast sockets
- fix compile issues with recent gcc and -Werror

Note that a number of the mentioned "fixes" have been created two years
ago already, and may have been released in packages for a long time,
where vendors have chosen to package them.



As to future plans for Heartbeat:

Heartbeat is still useful for non-pacemaker, "haresources"-mode clusters.

We (Linbit) will maintain Heartbeat for the foreseeable future.
That should not be too much of a burden, as it is "stable",
and due to long years of field exposure, "all bugs are known" ;-)

The most notable shortcoming when using Heartbeat with Pacemaker
clusters would be the limited message size.
There are currently no plans to remove that limitation.

With its wide choice of communications paths, even "exotic"
communication plugins, and the ability to run "arbitrarily many"
paths, some deployments may even favor it over Corosync still.

But typically, for new deployments involving Pacemaker,
in most cases you should chose Corosync 2.3.x
as your membership and communication layer.

For existing deployments using Heartbeat,
upgrading to this Heartbeat version is strongly recommended.

Thanks,

Lars Ellenberg



signature.asc
Description: Digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Openais] Issues with a squid cluster.

2015-02-10 Thread Jan Friesse
This is really question for pacemaker list, so CCing.

Regards,
  Honza

Redeye napsal(a):
> I am not certain where I should post this, hopefully someone will point me in 
> the right direction.
> 
> I have a two node cluster on Ubuntu 12.04, corosync, pacemaker, and squid.  
> Squid is not starting at boot, pacemaker is controlling that.  The two 
> servers are communicating just fine, pacemaker starts, stops, and monitors 
> the squid resources just fine too.  My problem is that I am unable to do 
> anything with the squid instances.  For example, I want to update an acl, and 
> I want to bounce the squid service to load the new settings.  Service squid3 
> stop|start|status|restart|etc does nothing, it returns unknown instance.  Ps 
> -af |grep squid shows two instances, one user root one user proxy, and squid 
> is doing what it is supposed to.  
> 
> What can I do to remedy this?
> ___
> Openais mailing list
> open...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/openais
> 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Active/Active

2015-02-10 Thread emmanuel segura
I'm using debian 7

apt-cache show gfs-pcmk
..
This package contains the GFS module for pacemaker.
...

2015-02-10 8:55 GMT+01:00 José Luis Rodríguez Rodríguez :
> Hello,
>
> I would like to create an active/active cluster by using pacemaker and
> corosync on Debian. I  have followed the documentation
> http://clusterlabs.org/doc/Cluster_from_Scratch.pdf.  It works well until
> 8.2.2 Create and Populate an GFS2 Partition.  When I try to mount the disk
> /dev/drbd1 as /mnt, the output is:
>
> gfs_controld join connect error: Connection refused
> error mounting lockproto lock_dlm
>
> I have read that it is necessary to use cman, but then the resources created
> by pacemaker (with the command crm configure primitive ...) doesn't appear
> with the command crm_mon.
>
> What could I do?
>
> --
> Saludos,
>
>
> José Luis
> --
> Profesor Informática IES Jacarandá -  Brenes (Sevilla)
> http://www.iesjacaranda.es  -   www.iesjacaranda-brenes.org
> twitter: @jlrod2
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-02-10 Thread Dejan Muhamedagic
On Mon, Feb 09, 2015 at 04:41:19PM +0100, Lars Ellenberg wrote:
> On Fri, Feb 06, 2015 at 04:15:44PM +0100, Dejan Muhamedagic wrote:
> > Hi,
> > 
> > On Thu, Feb 05, 2015 at 09:18:50AM +0100, Digimer wrote:
> > > That is the problem that makes geo-clustering very hard to nearly
> > > impossible. You can look at the Booth option for pacemaker, but that
> > > requires two (or more) full clusters, plus an arbitrator 3rd
> > 
> > A full cluster can consist of one node only. Hence, it is
> > possible to have a kind of stretch two-node [multi-site] cluster
> > based on tickets and managed by booth.
> 
> In theory.
> 
> In practice, we rely on "proper behaviour" of "the other site",
> in case a ticket is revoked, or cannot be renewed.
> 
> Relying on a single node for "proper behaviour" does not inspire
> as much confidence as relying on a multi-node HA-cluster at each site,
> which we can expect to ensure internal fencing.
> 
> With reliable hardware watchdogs, it still should be ok to do
> "stretched two node HA clusters" in a reliable way.
> 
> Be generous with timeouts.

As always.

> And document which failure modes you expect to handle,
> and how to deal with the worst-case scenarios if you end up with some
> failure case that you are not equipped to handle properly.
> 
> There are deployments which favor
> "rather online with _potential_ split brain" over
> "rather offline just in case".

There's an arbitrator which should help in case of split brain.

> Document this, print it out on paper,
> 
>"I am aware that this may lead to lost transactions,
>data divergence, data corruption, or data loss.
>I am personally willing to take the blame,
>and live with the consequences."
> 
> Have some "boss" sign that ^^^
> in the real world using a real pen.

Well, of course running such a "stretch" cluster would be
rather different from a "normal" one.

The essential thing is that there's no fencing, unless configured
as a dead-man switch for the ticket. Given that booth has a
"sanity" program hook, maybe that could be utilized to verify if
this side of the cluster is healthy enough.

Thanks,

Dejan

>   Lars
> 
> -- 
> : Lars Ellenberg
> : http://www.LINBIT.com | Your Way to High Availability
> : DRBD, Linux-HA  and  Pacemaker support and consulting
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] pacemaker does not start after cman config

2015-02-10 Thread Lukas Kostyan
Hi all,

was following the guide from clusterlab but use debian wheezy.
corosync   1.4.2-3

pacemaker  1.1.7-1
cman   3.0.12-3.2+deb7u2

configured the active/passive with no problems but as soon as I try to
config active/active with cman pacemaker doesnt start anymore it doesnt
even write anything related to pacemaker in the logs, any ideas how to get
a hint? Suggestions? I am following this guide:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/ch08.html

Thank you in advance!
###
/etc/init.d/service.d/pcmk is removed

Starting cluster:
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Starting gfs_controld... [  OK  ]
   Unfencing self... [  OK  ]
   Joining fence domain... [  OK  ]
root@vm-2:~# cman_tool nodes
Node  Sts   Inc   Joined   Name
   1   M264   2015-02-06 10:09:15  vm-1.cluster.com
   2   M256   2015-02-06 10:08:59  vm-2.cluster.com
root@vm-2:~# /etc/init.d/pacemaker start
Starting Pacemaker Cluster Manager: [FAILED]


root@vm-2:/var/log/cluster# cat corosync.log
Feb 06 10:43:29 corosync [MAIN  ] Corosync Cluster Engine ('1.4.2'):
started and ready to provide service.
Feb 06 10:43:29 corosync [MAIN  ] Corosync built-in features: nss
Feb 06 10:43:29 corosync [MAIN  ] Successfully read config from
/etc/cluster/cluster.conf
Feb 06 10:43:29 corosync [MAIN  ] Successfully parsed cman config
Feb 06 10:43:29 corosync [MAIN  ] Successfully configured openais services
to load
Feb 06 10:43:29 corosync [TOTEM ] Token Timeout (1 ms) retransmit
timeout (2380 ms)
Feb 06 10:43:29 corosync [TOTEM ] token hold (1894 ms) retransmits before
loss (4 retrans)
Feb 06 10:43:29 corosync [TOTEM ] join (60 ms) send_join (0 ms) consensus
(2 ms) merge (200 ms)
Feb 06 10:43:29 corosync [TOTEM ] downcheck (1000 ms) fail to recv const
(2500 msgs)
Feb 06 10:43:29 corosync [TOTEM ] seqno unchanged const (30 rotations)
Maximum network MTU 1402
Feb 06 10:43:29 corosync [TOTEM ] window size per rotation (50 messages)
maximum messages per rotation (17 messages)
Feb 06 10:43:29 corosync [TOTEM ] missed count const (5 messages)
Feb 06 10:43:29 corosync [TOTEM ] send threads (0 threads)
Feb 06 10:43:29 corosync [TOTEM ] RRP token expired timeout (2380 ms)
Feb 06 10:43:29 corosync [TOTEM ] RRP token problem counter (2000 ms)
Feb 06 10:43:29 corosync [TOTEM ] RRP threshold (10 problem count)
Feb 06 10:43:29 corosync [TOTEM ] RRP multicast threshold (100 problem
count)
Feb 06 10:43:29 corosync [TOTEM ] RRP automatic recovery check timeout
(1000 ms)
Feb 06 10:43:29 corosync [TOTEM ] RRP mode set to none.
Feb 06 10:43:29 corosync [TOTEM ] heartbeat_failures_allowed (0)
Feb 06 10:43:29 corosync [TOTEM ] max_network_delay (50 ms)
Feb 06 10:43:29 corosync [TOTEM ] HeartBeat is Disabled. To enable set
heartbeat_failures_allowed > 0
Feb 06 10:43:29 corosync [TOTEM ] Initializing transport (UDP/IP
Multicast).
Feb 06 10:43:29 corosync [TOTEM ] Initializing transmit/receive security:
libtomcrypt SOBER128/SHA1HMAC (mode 0).
Feb 06 10:43:29 corosync [IPC   ] you are using ipc api v2
Feb 06 10:43:29 corosync [TOTEM ] Receive multicast socket recv buffer size
(262142 bytes).
Feb 06 10:43:29 corosync [TOTEM ] Transmit multicast socket send buffer
size (262142 bytes).
Feb 06 10:43:29 corosync [TOTEM ] The network interface [192.168.1.7] is
now up.
Feb 06 10:43:29 corosync [TOTEM ] Created or loaded sequence id
108.192.168.1.7 for this ring.
Feb 06 10:43:29 corosync [QUORUM] Using quorum provider quorum_cman
Feb 06 10:43:29 corosync [SERV  ] Service engine loaded: corosync cluster
quorum service v0.1
Feb 06 10:43:29 corosync [CMAN  ] CMAN starting
Feb 06 10:43:29 corosync [CMAN  ] memb: Got node vm-1.cluster.com from ccs
(id=1, votes=1)
Feb 06 10:43:29 corosync [CMAN  ] memb: add_new_node: vm-1.cluster.com,
(id=1, votes=1) newalloc=1
Feb 06 10:43:29 corosync [CMAN  ] memb: Got node vm-2.cluster.com from ccs
(id=2, votes=1)
Feb 06 10:43:29 corosync [CMAN  ] memb: add_new_node: vm-2.cluster.com,
(id=2, votes=1) newalloc=1
Feb 06 10:43:29 corosync [CMAN  ] memb: add_new_node: vm-2.cluster.com,
(id=2, votes=1) newalloc=0
Feb 06 10:43:29 corosync [CMAN  ] CMAN 3.0.12 (built Jan 12 2013 15:20:22)
started
Feb 06 10:43:29 corosync [SERV  ] Service engine loaded: corosync CMAN
membership service 2.90
Feb 06 10:43:29 corosync [SERV  ] Service engine loaded: openais cluster
membership service B.01.01
Feb 06 10:43:29 corosync [EVT   ] Evt exec init request
Feb 06 10:43:29 corosync [SERV  ] Service engine loaded: openais event
service B.01.01
Feb 06 10:43:29 corosync [SERV  ] Service engine loaded: openais checkpoint
service B.01.01
Feb 06 10:43:29 corosync [MSG   ] [DEBUG]: msg_exec_init_fn
Feb 06 10:43:29 corosync [SERV  ] Service engine loade

[Pacemaker] can not start pacemaker after cman config

2015-02-10 Thread Lukas Kostyan
Hi all,

was following the guide from clusterlab but use debian wheezy.
corosync   1.4.2-3
pacemaker  1.1.7-1
cman   3.0.12-3.2+deb7u2

configured the active/passive with no problems but as soon as I try to
config active/active with cman pacemaker doesnt start anymore it doesnt
even write anything related to pacemaker in the logs, any ideas how to get
a hint? Suggestions? I am following this guide:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/ch08.html

Thank you in advance!
###
/etc/init.d/service.d/pcmk is removed

Starting cluster:
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Starting gfs_controld... [  OK  ]
   Unfencing self... [  OK  ]
   Joining fence domain... [  OK  ]
root@vm-2:~# cman_tool nodes
Node  Sts   Inc   Joined   Name
   1   M264   2015-02-06 10:09:15  vm-1.cluster.com
   2   M256   2015-02-06 10:08:59  vm-2.cluster.com
root@vm-2:~# /etc/init.d/pacemaker start
Starting Pacemaker Cluster Manager: [FAILED]


root@vm-2:/var/log/cluster# cat corosync.log
Feb 06 10:43:29 corosync [MAIN  ] Corosync Cluster Engine ('1.4.2'):
started and ready to provide service.
Feb 06 10:43:29 corosync [MAIN  ] Corosync built-in features: nss
Feb 06 10:43:29 corosync [MAIN  ] Successfully read config from
/etc/cluster/cluster.conf
Feb 06 10:43:29 corosync [MAIN  ] Successfully parsed cman config
Feb 06 10:43:29 corosync [MAIN  ] Successfully configured openais services
to load
Feb 06 10:43:29 corosync [TOTEM ] Token Timeout (1 ms) retransmit
timeout (2380 ms)
Feb 06 10:43:29 corosync [TOTEM ] token hold (1894 ms) retransmits before
loss (4 retrans)
Feb 06 10:43:29 corosync [TOTEM ] join (60 ms) send_join (0 ms) consensus
(2 ms) merge (200 ms)
Feb 06 10:43:29 corosync [TOTEM ] downcheck (1000 ms) fail to recv const
(2500 msgs)
Feb 06 10:43:29 corosync [TOTEM ] seqno unchanged const (30 rotations)
Maximum network MTU 1402
Feb 06 10:43:29 corosync [TOTEM ] window size per rotation (50 messages)
maximum messages per rotation (17 messages)
Feb 06 10:43:29 corosync [TOTEM ] missed count const (5 messages)
Feb 06 10:43:29 corosync [TOTEM ] send threads (0 threads)
Feb 06 10:43:29 corosync [TOTEM ] RRP token expired timeout (2380 ms)
Feb 06 10:43:29 corosync [TOTEM ] RRP token problem counter (2000 ms)
Feb 06 10:43:29 corosync [TOTEM ] RRP threshold (10 problem count)
Feb 06 10:43:29 corosync [TOTEM ] RRP multicast threshold (100 problem
count)
Feb 06 10:43:29 corosync [TOTEM ] RRP automatic recovery check timeout
(1000 ms)
Feb 06 10:43:29 corosync [TOTEM ] RRP mode set to none.
Feb 06 10:43:29 corosync [TOTEM ] heartbeat_failures_allowed (0)
Feb 06 10:43:29 corosync [TOTEM ] max_network_delay (50 ms)
Feb 06 10:43:29 corosync [TOTEM ] HeartBeat is Disabled. To enable set
heartbeat_failures_allowed > 0
Feb 06 10:43:29 corosync [TOTEM ] Initializing transport (UDP/IP
Multicast).
Feb 06 10:43:29 corosync [TOTEM ] Initializing transmit/receive security:
libtomcrypt SOBER128/SHA1HMAC (mode 0).
Feb 06 10:43:29 corosync [IPC   ] you are using ipc api v2
Feb 06 10:43:29 corosync [TOTEM ] Receive multicast socket recv buffer size
(262142 bytes).
Feb 06 10:43:29 corosync [TOTEM ] Transmit multicast socket send buffer
size (262142 bytes).
Feb 06 10:43:29 corosync [TOTEM ] The network interface [192.168.1.7] is
now up.
Feb 06 10:43:29 corosync [TOTEM ] Created or loaded sequence id
108.192.168.1.7 for this ring.
Feb 06 10:43:29 corosync [QUORUM] Using quorum provider quorum_cman
Feb 06 10:43:29 corosync [SERV  ] Service engine loaded: corosync cluster
quorum service v0.1
Feb 06 10:43:29 corosync [CMAN  ] CMAN starting
Feb 06 10:43:29 corosync [CMAN  ] memb: Got node vm-1.cluster.com from ccs
(id=1, votes=1)
Feb 06 10:43:29 corosync [CMAN  ] memb: add_new_node: vm-1.cluster.com,
(id=1, votes=1) newalloc=1
Feb 06 10:43:29 corosync [CMAN  ] memb: Got node vm-2.cluster.com from ccs
(id=2, votes=1)
Feb 06 10:43:29 corosync [CMAN  ] memb: add_new_node: vm-2.cluster.com,
(id=2, votes=1) newalloc=1
Feb 06 10:43:29 corosync [CMAN  ] memb: add_new_node: vm-2.cluster.com,
(id=2, votes=1) newalloc=0
Feb 06 10:43:29 corosync [CMAN  ] CMAN 3.0.12 (built Jan 12 2013 15:20:22)
started
Feb 06 10:43:29 corosync [SERV  ] Service engine loaded: corosync CMAN
membership service 2.90
Feb 06 10:43:29 corosync [SERV  ] Service engine loaded: openais cluster
membership service B.01.01
Feb 06 10:43:29 corosync [EVT   ] Evt exec init request
Feb 06 10:43:29 corosync [SERV  ] Service engine loaded: openais event
service B.01.01
Feb 06 10:43:29 corosync [SERV  ] Service engine loaded: openais checkpoint
service B.01.01
Feb 06 10:43:29 corosync [MSG   ] [DEBUG]: msg_exec_init_fn
Feb 06 10:43:29 corosync [SERV  ] Service engine loaded

[Pacemaker] Active/Active

2015-02-10 Thread José Luis Rodríguez Rodríguez
Hello,

I would like to create an active/active cluster by using pacemaker and
corosync on Debian. I  have followed the documentation
http://clusterlabs.org/doc/Cluster_from_Scratch.pdf.  It works well until
8.2.2 Create and Populate an GFS2 Partition.  When I try to mount the disk
/dev/drbd1 as /mnt, the output is:

*gfs_controld join connect error: Connection refused*
*error mounting lockproto lock_dlm*

I have read that it is necessary to use cman, but then the resources
created by pacemaker (with the command crm configure primitive ...) doesn't
appear with the command crm_mon.

What could I do?

-- 
Saludos,


José Luis
--
Profesor Informática IES Jacarandá -  Brenes (Sevilla)
http://www.iesjacaranda.es  -   www.iesjacaranda-brenes.org
twitter: @jlrod2
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] why sometimes pengine seems lazy

2015-02-10 Thread d tbsky
hi:
   I was using pacemaker and drbd with sl linux 6.5/6.6. all are fine.

   now I am tesing sl linux 7.0 and I notice when I want to promote
the drbd resource with "pcs resource meta  my-ms-drbd master-max=2".

sometimes pengine find the change immediately, but sometimes it
find the change after about a minute. I don't know if the delay is
normal? I didn't notice the delay when I using sl linux 6.5/6.6.

   the "good" result. kvm-3-ms-drbd set master-max = 2 at 13:00:07 and
 pengine find it at 13:00:07

Feb 10 13:00:06 [2893] love1-test.lhy.com.twcib: info:
cib_process_request: Completed cib_query operation for section
//constraints: OK (rc=0, origin=love2-test.lhy.com.tw/cibadmin/2,
version=0.2084.3)
Feb 10 13:00:06 [2893] love1-test.lhy.com.twcib: info:
cib_process_request: Completed cib_query operation for section
//constraints: OK (rc=0, origin=love2-test.lhy.com.tw/cibadmin/2,
version=0.2084.3)
Feb 10 13:00:06 [2893] love1-test.lhy.com.twcib: info:
cib_process_request: Completed cib_query operation for section
//constraints: OK (rc=0, origin=love2-test.lhy.com.tw/cibadmin/2,
version=0.2084.3)
Feb 10 13:00:07 [2893] love1-test.lhy.com.twcib:   notice:
cib:diff:Diff: --- 0.2084.3
Feb 10 13:00:07 [2893] love1-test.lhy.com.twcib:   notice:
cib:diff:Diff: +++ 0.2085.1 206a58e68f4a9cd8e72c7ebb40bef026
Feb 10 13:00:07 [2893] love1-test.lhy.com.twcib:   notice:
cib:diff:--   
Feb 10 13:00:07 [2893] love1-test.lhy.com.twcib:   notice:
cib:diff:++   
Feb 10 13:00:07 [2893] love1-test.lhy.com.twcib: info:
cib_process_request: Completed cib_replace operation for
section configuration: OK (rc=0,
origin=love2-test.lhy.com.tw/cibadmin/2, version=0.2085.1)
Feb 10 13:00:07 [2898] love1-test.lhy.com.tw   crmd: info:
abort_transition_graph:  te_update_diff:126 - Triggered transition
abort (complete=1, node=, tag=diff, id=(null), magic=NA, cib=0.2085.1)
: Non-status change
Feb 10 13:00:07 [2898] love1-test.lhy.com.tw   crmd:   notice:
do_state_transition: State transition S_IDLE ->
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Feb 10 13:00:07 [2893] love1-test.lhy.com.twcib: info:
cib_process_request: Completed cib_query operation for section
'all': OK (rc=0, origin=local/crmd/839, version=0.2085.1)
Feb 10 13:00:07 [2893] love1-test.lhy.com.twcib: info:
write_cib_contents:  Archived previous version as
/var/lib/pacemaker/cib/cib-17.raw
Feb 10 13:00:07 [2897] love1-test.lhy.com.twpengine:   notice:
unpack_config:   On loss of CCM Quorum: Ignore
Feb 10 13:00:07 [2897] love1-test.lhy.com.twpengine: info:
determine_online_status: Node love2-test.lhy.com.tw is online
Feb 10 13:00:07 [2897] love1-test.lhy.com.twpengine: info:
determine_online_status: Node love1-test.lhy.com.tw is online
Feb 10 13:00:07 [2893] love1-test.lhy.com.twcib: info:
write_cib_contents:  Wrote version 0.2085.0 of the CIB to disk
(digest: bfdd9b0a25cde05a4b2777b6fc670519)
Feb 10 13:00:07 [2897] love1-test.lhy.com.twpengine:   notice:
unpack_rsc_op:   Operation monitor found resource kvm-6-drbd:0
active in master mode on love2-test.lhy.com.tw
Feb 10 13:00:07 [2897] love1-test.lhy.com.twpengine: info:
unpack_rsc_op:   Operation monitor found resource kvm-6 active on
love2-test.lhy.com.tw
Feb 10 13:00:07 [2897] love1-test.lhy.com.twpengine: info:
unpack_rsc_op:   Operation monitor found resource kvm-1-drbd:1
active on love1-test.lhy.com.tw
Feb 10 13:00:07 [2893] love1-test.lhy.com.twcib: info:
retrieveCib: Reading cluster configuration from:
/var/lib/pacemaker/cib/cib.2Mn5wa (digest:
/var/lib/pacemaker/cib/cib.0nfve5)
Feb 10 13:00:07 [2897] love1-test.lhy.com.twpengine: info:
unpack_rsc_op:   Operation monitor found resource kvm-3-drbd:1
active on love1-test.lhy.com.tw
Feb 10 13:00:07 [2897] love1-test.lhy.com.twpengine:   notice:
unpack_rsc_op:   Re-initiated expired calculated failure
kvm-4_last_failure_0 (rc=7,
magic=0:7;144:22:0:87034531-de2d-4395-b3c0-9bc0cecfc50e) on
love1-test.lhy.com.tw
Feb 10 13:00:07 [2897] love1-test.lhy.com.twpengine: info:
unpack_rsc_op:   Operation monitor found resource kvm-2-drbd:1
active on love1-test.lhy.com.tw
Feb 10 13:00:07 [2897] love1-test.lhy.com.twpengine: info:
unpack_rsc_op:   Operation monitor found resource kvm-5 active on
love1-test.lhy.com.tw
Feb 10 13:00:07 [2897] love1-test.lhy.com.twpengine:   notice:
unpack_rsc_op:   Operation monitor found resource kvm-5-drbd:1
active in master mode on love1-test.lhy.com.tw
Feb 10 13:00:07 [2897] love1-test.lhy.com.twpengine: info:
unpack_rsc_op:   Operation monitor found resource kvm-6-drbd:1
active on love1-test.lhy.com.tw
Feb 10 13:00:07 [2897] love1-test.lhy.com.tw