Re: [Pacemaker] added a function to show crm_mon a ticket.

2012-03-20 Thread Yuusuke Iida

Hi, Andrew

I understood it.
I make it again after waiting for the change of the status section.

Thanks,
Yuusuke

(2012/03/21 11:17), Andrew Beekhof wrote:

We might want to hold off on this, Yan is thinking about some changes
to the status section which would impact this.

On Wed, Mar 21, 2012 at 1:13 PM, Yuusuke Iida
  wrote:

Hi, Andrew

I increased a function to display the information of the ticket in crm_mon.

How to use is as follows.
A short option: -c
A long option: --show-tickets

Example of the indication
# crm_mon -c1

Last updated: Mon Mar 19 14:36:25 2012
Last change: Mon Mar 19 13:52:43 2012 via crmd on pm1
Stack: Heartbeat
Current DC: pm1 (ec8fdebc-112f-4af1-828a-d4612f514a32) - partition with
quorum
Version: 1.1.6-31f6ca3
2 Nodes configured, 2 expected votes
8 Resources configured.


Online: [ pm1 pm2 ]

  dummy1 (ocf::pacemaker:Dummy): Started pm1
  dummy2 (ocf::pacemaker:Dummy): Started pm2
  Resource Group: grp1
 vip(ocf::heartbeat:IPaddr2):   Started pm1
 booth  (ocf::pacemaker:booth-site):Started pm1
  Master/Slave Set: ms-stateful [stateful]
 Masters: [ pm2 ]
 Slaves: [ pm1 ]
  Clone Set: clnStonith1 [stonith1]
 Started: [ pm2 pm1 ]

Cluster Tickets: Status'Last granted time'
* ticketA  : granted   'Mon Mar 19 14:35:46 2012'
* ticketB  : revoked   'Mon Mar 19 14:35:22 2012'



I want a repository to merge this function by all means.
https://github.com/ClusterLabs/pacemaker/pull/49

Best, regards,
Yuusuke
--

METRO SYSTEMS CO., LTD

Yuusuke Iida
Mail: iiday...@intellilink.co.jp


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--

METRO SYSTEMS CO., LTD

Yuusuke Iida
Mail: iiday...@intellilink.co.jp


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] added a function to show crm_mon a ticket.

2012-03-20 Thread Andrew Beekhof
We might want to hold off on this, Yan is thinking about some changes
to the status section which would impact this.

On Wed, Mar 21, 2012 at 1:13 PM, Yuusuke Iida
 wrote:
> Hi, Andrew
>
> I increased a function to display the information of the ticket in crm_mon.
>
> How to use is as follows.
> A short option: -c
> A long option: --show-tickets
>
> Example of the indication
> # crm_mon -c1
> 
> Last updated: Mon Mar 19 14:36:25 2012
> Last change: Mon Mar 19 13:52:43 2012 via crmd on pm1
> Stack: Heartbeat
> Current DC: pm1 (ec8fdebc-112f-4af1-828a-d4612f514a32) - partition with
> quorum
> Version: 1.1.6-31f6ca3
> 2 Nodes configured, 2 expected votes
> 8 Resources configured.
> 
>
> Online: [ pm1 pm2 ]
>
>  dummy1 (ocf::pacemaker:Dummy): Started pm1
>  dummy2 (ocf::pacemaker:Dummy): Started pm2
>  Resource Group: grp1
>     vip        (ocf::heartbeat:IPaddr2):       Started pm1
>     booth      (ocf::pacemaker:booth-site):    Started pm1
>  Master/Slave Set: ms-stateful [stateful]
>     Masters: [ pm2 ]
>     Slaves: [ pm1 ]
>  Clone Set: clnStonith1 [stonith1]
>     Started: [ pm2 pm1 ]
>
> Cluster Tickets            : Status        'Last granted time'
> * ticketA                  : granted       'Mon Mar 19 14:35:46 2012'
> * ticketB                  : revoked       'Mon Mar 19 14:35:22 2012'
>
>
>
> I want a repository to merge this function by all means.
> https://github.com/ClusterLabs/pacemaker/pull/49
>
> Best, regards,
> Yuusuke
> --
> 
> METRO SYSTEMS CO., LTD
>
> Yuusuke Iida
> Mail: iiday...@intellilink.co.jp
> 
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] added a function to show crm_mon a ticket.

2012-03-20 Thread Yuusuke Iida
Hi, Andrew

I increased a function to display the information of the ticket in crm_mon.

How to use is as follows.
A short option: -c
A long option: --show-tickets

Example of the indication
# crm_mon -c1

Last updated: Mon Mar 19 14:36:25 2012
Last change: Mon Mar 19 13:52:43 2012 via crmd on pm1
Stack: Heartbeat
Current DC: pm1 (ec8fdebc-112f-4af1-828a-d4612f514a32) - partition with
quorum
Version: 1.1.6-31f6ca3
2 Nodes configured, 2 expected votes
8 Resources configured.


Online: [ pm1 pm2 ]

 dummy1 (ocf::pacemaker:Dummy): Started pm1
 dummy2 (ocf::pacemaker:Dummy): Started pm2
 Resource Group: grp1
 vip(ocf::heartbeat:IPaddr2):   Started pm1
 booth  (ocf::pacemaker:booth-site):Started pm1
 Master/Slave Set: ms-stateful [stateful]
 Masters: [ pm2 ]
 Slaves: [ pm1 ]
 Clone Set: clnStonith1 [stonith1]
 Started: [ pm2 pm1 ]

Cluster Tickets: Status'Last granted time'
* ticketA  : granted   'Mon Mar 19 14:35:46 2012'
* ticketB  : revoked   'Mon Mar 19 14:35:22 2012'



I want a repository to merge this function by all means.
https://github.com/ClusterLabs/pacemaker/pull/49

Best, regards,
Yuusuke
-- 

METRO SYSTEMS CO., LTD

Yuusuke Iida
Mail: iiday...@intellilink.co.jp


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Sporadic problems of rejoin after split brain situation

2012-03-20 Thread Andrew Beekhof
On Wed, Mar 21, 2012 at 8:46 AM, Lars Ellenberg
 wrote:
> On Sun, Mar 18, 2012 at 11:53:11PM +, Roman Sidler wrote:
>> Hi all
>> No, we have no crossover cable between the nodes.
>> The 2 nodes are linked by a switched network, and this works really fine 
>> except
>> the mentioned case.
>>
>> It's rather easy to reproduce.
>> 1. Activate a 2-node cluster
>> 2. disconnect network connection (e.g by disconnect network adapter on VM)
>> 3. wait until both nodes are active and act as DC
>> 4. reconnect nodes
>> 5. the new DC is elected
>>
>> When step 4 encounters an unsteady network, sometimes step 5 will never be
>> reached and both nodes stays DC. They're are sending and receiving heartbeat
>> status messages.
>>
>> The situation may be simulated by some repeatedly disconnects/connects
>>
>> Versions:
>> pacemaker 1.1.6 (and 1.0.5)
>> heartbeat 3.0.7 (and 3.0.0)
>
> There is no heartbeat 3.0.7 (yet), so I guess you meant 2.0.7.
> And no doubt that has a few problems.
>
> You are supposed to use latest heartbeat (now: 3.0.5)
> http://hg.linux-ha.org/heartbeat-STABLE_3_0/
>
> "Unsteady network": I suppose that means at least packet loss.
> There have been long standing bugs with retransmit and packet loss
> in pacemaker which I only fixed in 3.0.5.
>
> There is one more problem in that area I'm aware of, which is much less
> likely to trigger, but if it does, you'll know, as heartbeat will start
> spinning with high cpu load, and not even notice that a peer node is dead.
>
> It has been introduced in 2005 or so, is a glib mainloop callback
> priority inversion involving timeout based events, only triggers
> sometimes, may need even special config parameters in ha.cf to trigger
> at all (and needs massive packet loss ...). And it is even fixed as
> well. But as I only had one complaint that remotely matched these
> symptoms, it is still not in the repos.
>
> Hopefully we find some time to clean that up
> and release it with heartbeat 3.0.6.
>
> As we finally noticed that some things still packaged with heartbeat
> actually belong into glue, we seem to have to cut a release anyways,
> "soon".
>
> BTW, I've seen similar behaviour with corosync as well.
> In fact, for me, this exact scenario (node returning after having been
> declared dead) typically does NOT work with pacemaker on corosync,
> while it always worked with pacemaker on heartbeat...

I believe things are much improved with more recent releases.
Hence the recommendation of 1.4.x :-)

>
> Still, Andrew is right: for new clusters, corosync is the way to go.
>
> Not so long ago, I posted some corosync vs heartbeat summary
> from my current perspective:
> http://permalink.gmane.org/gmane.linux.highavailability.user/36903
>
>> *** pacemaker 1.0.5 ***
>
> That's not the most recent, either.
> iirc, there have been some fixes in pacemaker as well,
> in the area of rejoining after partition.
>
> But really: you need to fix your "unsteady" network,
> and probably should implement stonith.
>
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: CRIT: Cluster node lab13 returning
>> after partition.
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: For information on cluster
>> partitions, See URL: http://linux-ha.org/SplitBrain
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: WARN: Deadtime value may be too 
>> small.
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: See FAQ for information on 
>> tuning
>> deadtime.
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: URL: http://linux-
>> ha.org/FAQ#heavy_load
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: Link lab13:eth0 up.
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: WARN: Late heartbeat: Node lab13:
>> interval 244130 ms
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: Status update for node lab13:
>> status active
>> Mar  8 07:05:50 LAB19 crmd: [3083]: notice: crmd_ha_status_callback: Status
>> update: Node lab13 now has status [active] (DC=true)
>> Mar  8 07:05:50 LAB19 crmd: [3083]: info: crm_update_peer_proc: lab13.ais is 
>> now
>> online
>> Mar  8 07:05:50 LAB19 cib: [3079]: WARN: cib_peer_callback: Discarding
>> cib_apply_diff message (14a33) from lab13: not in our membership
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: all clients are now paused
>> Mar  8 07:05:51 LAB19 ccm: [3078]: info: Break tie for 2 nodes cluster
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: Got an event
>> OC_EV_MS_INVALID from ccm
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: no mbr_track info
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: Got an event
>> OC_EV_MS_NEW_MEMBERSHIP from ccm
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: instance=15,
>> nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: crmd_ccm_msg_callback: Quorum
>> (re)attained after event=NEW MEMBERSHIP (id=15)
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: ccm_event_detail: NEW MEMBERSHIP:
>> trans=15, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old

Re: [Pacemaker] Sporadic problems of rejoin after split brain situation

2012-03-20 Thread Lars Ellenberg
On Sun, Mar 18, 2012 at 11:53:11PM +, Roman Sidler wrote:
> Hi all
> No, we have no crossover cable between the nodes.
> The 2 nodes are linked by a switched network, and this works really fine 
> except 
> the mentioned case. 
> 
> It's rather easy to reproduce.
> 1. Activate a 2-node cluster
> 2. disconnect network connection (e.g by disconnect network adapter on VM)
> 3. wait until both nodes are active and act as DC
> 4. reconnect nodes
> 5. the new DC is elected
> 
> When step 4 encounters an unsteady network, sometimes step 5 will never be 
> reached and both nodes stays DC. They're are sending and receiving heartbeat 
> status messages.
> 
> The situation may be simulated by some repeatedly disconnects/connects
> 
> Versions:
> pacemaker 1.1.6 (and 1.0.5)
> heartbeat 3.0.7 (and 3.0.0)

There is no heartbeat 3.0.7 (yet), so I guess you meant 2.0.7.
And no doubt that has a few problems.

You are supposed to use latest heartbeat (now: 3.0.5)
http://hg.linux-ha.org/heartbeat-STABLE_3_0/

"Unsteady network": I suppose that means at least packet loss.
There have been long standing bugs with retransmit and packet loss
in pacemaker which I only fixed in 3.0.5.

There is one more problem in that area I'm aware of, which is much less
likely to trigger, but if it does, you'll know, as heartbeat will start
spinning with high cpu load, and not even notice that a peer node is dead.

It has been introduced in 2005 or so, is a glib mainloop callback
priority inversion involving timeout based events, only triggers
sometimes, may need even special config parameters in ha.cf to trigger
at all (and needs massive packet loss ...). And it is even fixed as
well. But as I only had one complaint that remotely matched these
symptoms, it is still not in the repos.

Hopefully we find some time to clean that up
and release it with heartbeat 3.0.6.

As we finally noticed that some things still packaged with heartbeat
actually belong into glue, we seem to have to cut a release anyways,
"soon".

BTW, I've seen similar behaviour with corosync as well.
In fact, for me, this exact scenario (node returning after having been
declared dead) typically does NOT work with pacemaker on corosync,
while it always worked with pacemaker on heartbeat...

Still, Andrew is right: for new clusters, corosync is the way to go.

Not so long ago, I posted some corosync vs heartbeat summary
from my current perspective:
http://permalink.gmane.org/gmane.linux.highavailability.user/36903

> *** pacemaker 1.0.5 ***

That's not the most recent, either.
iirc, there have been some fixes in pacemaker as well,
in the area of rejoining after partition.

But really: you need to fix your "unsteady" network,
and probably should implement stonith.

> Mar  8 07:05:50 LAB19 heartbeat: [2979]: CRIT: Cluster node lab13 returning 
> after partition.
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: For information on cluster 
> partitions, See URL: http://linux-ha.org/SplitBrain
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: WARN: Deadtime value may be too 
> small.
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: See FAQ for information on 
> tuning 
> deadtime.
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: URL: http://linux-
> ha.org/FAQ#heavy_load
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: Link lab13:eth0 up.
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: WARN: Late heartbeat: Node lab13: 
> interval 244130 ms
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: Status update for node lab13: 
> status active
> Mar  8 07:05:50 LAB19 crmd: [3083]: notice: crmd_ha_status_callback: Status 
> update: Node lab13 now has status [active] (DC=true)
> Mar  8 07:05:50 LAB19 crmd: [3083]: info: crm_update_peer_proc: lab13.ais is 
> now 
> online
> Mar  8 07:05:50 LAB19 cib: [3079]: WARN: cib_peer_callback: Discarding 
> cib_apply_diff message (14a33) from lab13: not in our membership
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: all clients are now paused
> Mar  8 07:05:51 LAB19 ccm: [3078]: info: Break tie for 2 nodes cluster
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: no mbr_track info
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: instance=15, 
> nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: crmd_ccm_msg_callback: Quorum 
> (re)attained after event=NEW MEMBERSHIP (id=15)
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: ccm_event_detail: NEW MEMBERSHIP: 
> trans=15, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: ccm_event_detail: CURRENT: lab19 
> [nodeid=1, born=15]
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: populate_cib_nodes_ha: Requesting 
> the 
> list of configured nodes
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event:

Re: [Pacemaker] fence_legacy, stonith and apcmastersnmp

2012-03-20 Thread Andrew Beekhof
2012/3/5 Kadlecsik József :
> On Mon, 5 Mar 2012, Andrew Beekhof wrote:
>
>> 2012/3/2 Kadlecsik József :
>> > On Fri, 2 Mar 2012, Andrew Beekhof wrote:
>> >
>> >> 2012/3/2 Kadlecsik József :
>> >> >
>> >> > After upgrading to pacemaker 1.1.6, cluster-glue 1.0.8 on Debian, our
>> >> > working apcmastersnmp resources stopped to work:
>> >> >
>> >> > Feb 29 14:22:03 atlas0 stonith: [35438]: ERROR: apcmastersnmp device not
>> >> > accessible.
>> >> > Feb 29 14:22:03 atlas0 stonith-ng: [32972]: notice: log_operation:
>> >> > Operation 'monitor' [35404] for device 'stonith-atlas6' returned: -2
>> >> > Feb 29 14:22:03 atlas0 stonith-ng: [32972]: ERROR: log_operation:
>> >> > stonith-atlas6: Performing: stonith -t apcmastersnmp -S 161
>> >> > Feb 29 14:22:03 atlas0 stonith-ng: [32972]: ERROR: log_operation:
>> >> > stonith-atlas6: Invalid config info for apcmastersnmp device
>> >> >
>> >> > Please note the strange "161" argument of stonith.
>> >> >
>> >> > After checking the source code and stracing stonithd, as far as I see, 
>> >> > the
>> >> > following happens:
>> >> >
>> >> > - stonithd calls fence_legacy, which steals the "port=161" parameter 
>> >> > from
>> >> >  apcmastersnmp. This produces the error message
>> >> >  "Invalid config info for apcmastersnmp device"
>> >>
>> >> You keep saying steals, what do you mean by that?  Where is it stolen 
>> >> from?
>> >
>> > fence_legacy passes the parameters to the stonith drivers via environment
>> > variables, except the "port".
>>
>> I had totally forgotten we do that.  Everything you've done makes
>> complete sense now.
>>
>> The second part is already pushed as:
>>    https://github.com/beekhof/pacemaker/commit/797d740
>>
>> I'll add the first part that adds the port as an environment variable now.
>
> The patches I suggested makes possible for the agents to start up, but
> fencing still doesn't work. Digging into it a little deeper, we found the
> following: the node to be fenced is passed to fence_legacy via stdin as
> "nodename=", however it's not coverted to $opt_n anywhere in the
> script. The following patch fixes the startup and fencing issues:
>
> --- fence_legacy.orig   2012-03-02 11:10:11.911369622 +0100
> +++ fence_legacy        2012-03-05 10:33:30.081345464 +0100
> @@ -102,7 +102,11 @@
>         {
>             $opt_o = $val;
>         }
> -       elsif ($name eq "port" )
> +        # elsif ($name eq "port" )
> +       # {
> +        #     $opt_n = $val;
> +        # }
> +        elsif ($name eq "nodename" )
>        {
>             $opt_n = $val;
>         }
> @@ -176,8 +180,8 @@
>    }
>    elsif ( $opt_o eq "monitor" || $opt_o eq "stat" || $opt_o eq "status" )
>    {
> -       print "Performing: $opt_s -t $opt_t -S $opt_n\n" unless defined 
> $opt_q;
> -       exec "$opt_s -t $opt_t $extra_args -S $opt_n" or die "failed to exec 
> \"$opt_s\"\n";
> +       print "Performing: $opt_s -t $opt_t -S\n" unless defined $opt_q;
> +       exec "$opt_s -t $opt_t $extra_args -S" or die "failed to exec 
> \"$opt_s\"\n";
>    }
>    else
>    {
>
> With the modified fence_legacy ipmilan works again. However apcmastersnmp
> still doesn't work, the parameters
>
>        pcmk_host_list="nodename" pcmk_host_check="static-list"
>
> must be added to the original ones in order to make it work again.

host_check should be implied by host_list.
maybe i only fixed that after 1.1.6 went out though

>
> Best regards,
> Jozsef
> --
> E-mail : kadlecsik.joz...@wigner.mta.hu
> PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
> Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences
>         H-1525 Budapest 114, POB. 49, Hungary
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to setup STONITH in a 2-node active/passive linux HA pacemaker cluster?

2012-03-20 Thread Andreas Kurz
On 03/20/2012 04:14 PM, Mathias Nestler wrote:
> Hi Dejan,
> 
> On 20.03.2012, at 15:25, Dejan Muhamedagic wrote:
> 
>> Hi,
>>
>> On Tue, Mar 20, 2012 at 08:52:39AM +0100, Mathias Nestler wrote:
>>> On 19.03.2012, at 20:26, Florian Haas wrote:
>>>
 On Mon, Mar 19, 2012 at 8:14 PM, Mathias Nestler
 mailto:mathias.nest...@barzahlen.de>>
 wrote:
> Hi everyone,
>
> I am trying to setup an active/passive (2 nodes) Linux-HA cluster
> with corosync and pacemaker to hold a PostgreSQL-Database up and
> running. It works via DRBD and a service-ip. If node1 fails, node2
> should take over. The same if PG runs on node2 and it fails.
> Everything works fine except the STONITH thing.
>
> Between the nodes is an dedicated HA-connection (10.10.10.X), so I
> have the following interface configuration:
>
> eth0eth1   host
> 10.10.10.251172.10.10.1 node1
> 10.10.10.252172.10.10.2 node2
>
> Stonith is enabled and I am testing with a ssh-agent to kill nodes.
>
> crm configure property stonith-enabled=true
> crm configure property stonith-action=poweroff
> crm configure rsc_defaults resource-stickiness=100
> crm configure property no-quorum-policy=ignore
>
> crm configure primitive stonith_postgres stonith:external/ssh \
>  params hostlist="node1 node2"
> crm configure clone fencing_postgres stonith_postgres

 You're missing location constraints, and doing this with 2 primitives
 rather than 1 clone is usually cleaner. The example below is for
 external/libvirt rather than external/ssh, but you ought to be able to
 apply the concept anyhow:

 http://www.hastexo.com/resources/hints-and-kinks/fencing-virtual-cluster-nodes

>>>
>>> As is understood the cluster decides which node has to be stonith'ed.
>>> Besides this, I already tried the following configuration:
>>>
>>> crm configure primitive stonith1_postgres stonith:ssh \
>>> params hostlist="node1"
>>> op monitor interval="25" timeout="10"
>>> crm configure primitive stonith2_postgres stonith:ssh \
>>> params hostlist="node2"
>>> op monitor interval="25" timeout="10"
>>> crm configure location stonith1_not_on_node1 stonith1_postgres \
>>> -inf: node1
>>> crm configure location stonith2_not_on_node2 stonith2_postgres \
>>> -inf: node2
>>>
>>> The result is the same :/
>>
>> Neither ssh nor external/ssh are supported fencing options. Both
>> include a sleep before reboot which makes the window in which
>> it's possible for both nodes to fence each other larger than it
>> is usually the case with production quality stonith plugins.
> 
> I use this ssh-stonith only for testing. At the moment I am creating the
> cluster in a virtual environment. Besides this, what is the difference
> between ssh and external/ssh?

the first one is a binary implementation, the second one is a simple
shell script ... that's it ;-)

> My problem is, that each node tries to kill the other. But I only want
> to kill the node with the postgres resource on it if connection between
> nodes breaks.

That is the expected behavior if you introduce a split-brain in a two
node cluster. Each node builds its own cluster partition and tries to
stonith the other "dead" node.

If you are using a virtualization environment managed by libvirt you can
follow the link Florian posted. If you are running on some VMware or
Virtualbox testing environment using sbd for fencing might be a good
option ... as shared storage can be provided easily.

Then you could also do a weak colocation of the one sbd stonith agent
instance with your postgres instance and in combination with the correct
start-timeout you can get the behavior you want.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
>>
>> As for the configuration, I'd rather use the first one, just not
>> cloned. That also helps prevent mutual fencing.
>>
> 
> I cloned it because I also want the STONITH-feature if postgres lives on
> the other node. How can I achieve it?
> 
>> See also:
>>
>> http://www.clusterlabs.org/doc/crm_fencing.html
>> http://ourobengr.com/ha
>>
> 
> Thank you very much
> 
> Best
> Mathias
> 
>> Thanks,
>>
>> Dejan
>>
 Hope this helps.
 Cheers,
 Florian

>>>
>>> Best
>>> Mathias
>>>
>>
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> 
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clust

[Pacemaker] Resource Agent ethmonitor

2012-03-20 Thread Fiorenza Meini

Hi there,
has anybody configured successfully the RA specified in the object of 
the message?


I got this error: if_eth0_monitor_0 (node=fw1, call=2297, rc=-2, 
status=Timed Out): unknown exec error


The RA definition in CIB is:

primitive if_eth0 ocf:heartbeat:ethmonitor \
params interface="eth0" name="wan_eth0" \
op monitor interval="20s" timeout="50s" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="20"


Thanks and regards
--

Fiorenza Meini
Spazio Web S.r.l.

V. Dante Alighieri, 10 - 13900 Biella
Tel.: 015.2431982 - 015.9526066
Fax: 015.2522600
Reg. Imprese, CF e P.I.: 02414430021
Iscr. REA: BI - 188936
Iscr. CCIAA: Biella - 188936
Cap. Soc.: 30.000,00 Euro i.v.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to setup STONITH in a 2-node active/passive linux HA pacemaker cluster?

2012-03-20 Thread Mathias Nestler
Hi Dejan,

On 20.03.2012, at 15:25, Dejan Muhamedagic wrote:

> Hi,
> 
> On Tue, Mar 20, 2012 at 08:52:39AM +0100, Mathias Nestler wrote:
>> On 19.03.2012, at 20:26, Florian Haas wrote:
>> 
>>> On Mon, Mar 19, 2012 at 8:14 PM, Mathias Nestler
>>>  wrote:
 Hi everyone,
 
 I am trying to setup an active/passive (2 nodes) Linux-HA cluster with 
 corosync and pacemaker to hold a PostgreSQL-Database up and running. It 
 works via DRBD and a service-ip. If node1 fails, node2 should take over. 
 The same if PG runs on node2 and it fails. Everything works fine except 
 the STONITH thing.
 
 Between the nodes is an dedicated HA-connection (10.10.10.X), so I have 
 the following interface configuration:
 
 eth0eth1   host
 10.10.10.251172.10.10.1 node1
 10.10.10.252172.10.10.2 node2
 
 Stonith is enabled and I am testing with a ssh-agent to kill nodes.
 
 crm configure property stonith-enabled=true
 crm configure property stonith-action=poweroff
 crm configure rsc_defaults resource-stickiness=100
 crm configure property no-quorum-policy=ignore
 
 crm configure primitive stonith_postgres stonith:external/ssh \
  params hostlist="node1 node2"
 crm configure clone fencing_postgres stonith_postgres
>>> 
>>> You're missing location constraints, and doing this with 2 primitives
>>> rather than 1 clone is usually cleaner. The example below is for
>>> external/libvirt rather than external/ssh, but you ought to be able to
>>> apply the concept anyhow:
>>> 
>>> http://www.hastexo.com/resources/hints-and-kinks/fencing-virtual-cluster-nodes
>>> 
>> 
>> As is understood the cluster decides which node has to be stonith'ed. 
>> Besides this, I already tried the following configuration:
>> 
>> crm configure primitive stonith1_postgres stonith:ssh \
>>  params hostlist="node1"
>>  op monitor interval="25" timeout="10"
>> crm configure primitive stonith2_postgres stonith:ssh \
>>  params hostlist="node2"
>>  op monitor interval="25" timeout="10"
>> crm configure location stonith1_not_on_node1 stonith1_postgres \
>>  -inf: node1
>> crm configure location stonith2_not_on_node2 stonith2_postgres \
>>  -inf: node2
>> 
>> The result is the same :/
> 
> Neither ssh nor external/ssh are supported fencing options. Both
> include a sleep before reboot which makes the window in which
> it's possible for both nodes to fence each other larger than it
> is usually the case with production quality stonith plugins.

I use this ssh-stonith only for testing. At the moment I am creating the 
cluster in a virtual environment. Besides this, what is the difference between 
ssh and external/ssh?
My problem is, that each node tries to kill the other. But I only want to kill 
the node with the postgres resource on it if connection between nodes breaks.

> 
> As for the configuration, I'd rather use the first one, just not
> cloned. That also helps prevent mutual fencing.
> 

I cloned it because I also want the STONITH-feature if postgres lives on the 
other node. How can I achieve it?

> See also:
> 
> http://www.clusterlabs.org/doc/crm_fencing.html
> http://ourobengr.com/ha
> 

Thank you very much

Best
Mathias

> Thanks,
> 
> Dejan
> 
>>> Hope this helps.
>>> Cheers,
>>> Florian
>>> 
>> 
>> Best
>> Mathias
>> 
> 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Using shadow configurations noninteractively

2012-03-20 Thread Dejan Muhamedagic
Hi,

On Mon, Mar 19, 2012 at 09:30:45PM +0100, Florian Haas wrote:
> On Mon, Mar 19, 2012 at 9:00 PM, Phil Frost  wrote:
> > On Mar 19, 2012, at 15:22 , Florian Haas wrote:
> >> On Mon, Mar 19, 2012 at 8:00 PM, Phil Frost  
> >> wrote:
> >>> I'm attempting to automate my cluster configuration with Puppet. I'm 
> >>> already using Puppet to manage the configuration of my Xen domains. I'd 
> >>> like to instruct puppet to apply the configuration (via cibadmin) to a 
> >>> shadow config, but I can't find any sure way to do this. The issue is 
> >>> that running "crm_shadow --create ..." starts a subshell, but there's no 
> >>> easy way I can tell puppet to run a command, then run another command in 
> >>> the subshell it creates.
> >>>
> >>> Normally I'd expect some command-line option, but I can't find any. It 
> >>> does look like it sets the environment variable "CIB_shadow". Is that all 
> >>> there is to it? Is it safe to rely on that behavior?
> >>
> >> I've never tried this specific use case, so bear with me while I go
> >> out on a limb, but the crm shell is fully scriptable. Thus you
> >> *should* be able to generate a full-blown crm script, with "cib foo"
> >> commands and whathaveyou, in a temporary file, and then just do "crm <
> >> /path/to/temp/file". Does that work for you?
> >
> >
> > I don't think so, because the crm shell, unlike cibadmin, has no idempotent 
> > method of configuration I've found. With cibadmin, I can generate the 
> > configuration for the primitive and associated location constraints for 
> > each Xen domain in one XML file, and feed it cibadmin -M as many times as I 
> > want without error. I know that by running that command, the resulting 
> > configuration is what I had in the file, regardless if the configuration 
> > already existed, did not exist, or existed but some parameters were 
> > different.
> >
> > To do this with with crm, I'd have to also write code which
> > checks if things are configured as I want them,

Interesting. Why is it that you cannot trust crm?

> > then take
> > different actions if it doesn't exist, already exists, or
> > already exists but has the incorrect value. That's not
> > impossible, but it's far harder to develop and quite likely I'll
> > make an error in all that logic that will automate the
> > destruction of my cluster.  > 
> Huh? What's wrong with "crm configure load replace "?

Yes, I'd also expect that to always produce the same
configuration, i.e. the one as specified in the input file. If
it doesn't, then please file a bug report.

> Anyhow, I think you haven't really stated what you are trying to
> achieve, in detail. So: what is it that you want to do exactly?

Anybody's guess, but for whatever reason they don't seem
comfortable with the crm shell.

Thanks,

Dejan

> Florian
> 
> -- 
> Need help with High Availability?
> http://www.hastexo.com/now
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to setup STONITH in a 2-node active/passive linux HA pacemaker cluster?

2012-03-20 Thread Dejan Muhamedagic
Hi,

On Tue, Mar 20, 2012 at 08:52:39AM +0100, Mathias Nestler wrote:
> On 19.03.2012, at 20:26, Florian Haas wrote:
> 
> > On Mon, Mar 19, 2012 at 8:14 PM, Mathias Nestler
> >  wrote:
> >> Hi everyone,
> >> 
> >> I am trying to setup an active/passive (2 nodes) Linux-HA cluster with 
> >> corosync and pacemaker to hold a PostgreSQL-Database up and running. It 
> >> works via DRBD and a service-ip. If node1 fails, node2 should take over. 
> >> The same if PG runs on node2 and it fails. Everything works fine except 
> >> the STONITH thing.
> >> 
> >> Between the nodes is an dedicated HA-connection (10.10.10.X), so I have 
> >> the following interface configuration:
> >> 
> >> eth0eth1   host
> >> 10.10.10.251172.10.10.1 node1
> >> 10.10.10.252172.10.10.2 node2
> >> 
> >> Stonith is enabled and I am testing with a ssh-agent to kill nodes.
> >> 
> >> crm configure property stonith-enabled=true
> >> crm configure property stonith-action=poweroff
> >> crm configure rsc_defaults resource-stickiness=100
> >> crm configure property no-quorum-policy=ignore
> >> 
> >> crm configure primitive stonith_postgres stonith:external/ssh \
> >>   params hostlist="node1 node2"
> >> crm configure clone fencing_postgres stonith_postgres
> > 
> > You're missing location constraints, and doing this with 2 primitives
> > rather than 1 clone is usually cleaner. The example below is for
> > external/libvirt rather than external/ssh, but you ought to be able to
> > apply the concept anyhow:
> > 
> > http://www.hastexo.com/resources/hints-and-kinks/fencing-virtual-cluster-nodes
> > 
> 
> As is understood the cluster decides which node has to be stonith'ed. Besides 
> this, I already tried the following configuration:
> 
> crm configure primitive stonith1_postgres stonith:ssh \
>   params hostlist="node1"
>   op monitor interval="25" timeout="10"
> crm configure primitive stonith2_postgres stonith:ssh \
>   params hostlist="node2"
>   op monitor interval="25" timeout="10"
> crm configure location stonith1_not_on_node1 stonith1_postgres \
>   -inf: node1
> crm configure location stonith2_not_on_node2 stonith2_postgres \
>   -inf: node2
> 
> The result is the same :/

Neither ssh nor external/ssh are supported fencing options. Both
include a sleep before reboot which makes the window in which
it's possible for both nodes to fence each other larger than it
is usually the case with production quality stonith plugins.

As for the configuration, I'd rather use the first one, just not
cloned. That also helps prevent mutual fencing.

See also:

http://www.clusterlabs.org/doc/crm_fencing.html
http://ourobengr.com/ha

Thanks,

Dejan

> > Hope this helps.
> > Cheers,
> > Florian
> > 
> 
> Best
> Mathias
> 

> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How can I preview the shadow configuration?

2012-03-20 Thread Florian Haas
On Tue, Mar 20, 2012 at 11:15 AM, Rasto Levrinc  wrote:
> 2012/3/20 Mars gu :
>> Hi,
>>     I want to excute the command ,the problem occurred:
>>
>> [root@h10_148 ~]# ptest
>> -bash: ptest: command not found
>>
>> How can I preview the shadow configuration?
>
> ptest has been replaced by crm_simulate.

I thought I recalled that ptest was kicked out of the RHEL/CentOS
packages in 1.1.6, and that 1.1.5 still shipped with it. At any rate,
crm_simulate should be in both, and it would be the preferred utility
to use.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Migration atomicity

2012-03-20 Thread Lars Marowsky-Bree
On 2012-03-20T11:42:08, Andrew Beekhof  wrote:

> > I'm observing a little bit unintuitive behavior of migration logic when
> > transition is aborted (due to CIB change) in the middle of the resource
> > migration.
> >
> > That is:
> > 1. nodea: migrate_to nodeb
> > 2. transition abort
> > 3. nodeb: stop
> > 4. nodea: migrate_to nodec
> 
> I'd like to see that a crm_report showing that behavior.
> Because I'm looking at the same scenario and I see:
> 
> 1. nodea: migrate_to nodeb
> 2. transition abort
> 3. nodea: stop
> 4. nodeb: stop
> 5. nodec: start

Yes, that's what I see and expect too.

What would obviously be very nice is if the PE could reconstruct that a
migration is on-going and allow it to complete first, before again
shuffling the resource to where it now believes it should be. (To avoid
service downtime.) I wonder how hard that'd be? ;-)


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How can I preview the shadow configuration?

2012-03-20 Thread Rasto Levrinc
On Tue, Mar 20, 2012 at 11:20 AM, Lars Marowsky-Bree  wrote:
> On 2012-03-20T11:15:53, Rasto Levrinc  wrote:
>
>> ptest has been replaced by crm_simulate. Also -L option in crm_simulate can
>> crash, so don't use that, also there can be wrong option in crm shell, so
>> you may see weird XML errors.
>
> What's up with the latter? Is there a bug report?

All that's been fixed, only that all that came together in centos/rhel6 :)

Rasto

>
>
> Regards,
>    Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
> HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>



-- 
Dipl.-Ing. Rastislav Levrinc
rasto.levr...@gmail.com
Linux Cluster Management Console
http://lcmc.sf.net/

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How can I preview the shadow configuration?

2012-03-20 Thread Lars Marowsky-Bree
On 2012-03-20T11:15:53, Rasto Levrinc  wrote:

> ptest has been replaced by crm_simulate. Also -L option in crm_simulate can
> crash, so don't use that, also there can be wrong option in crm shell, so
> you may see weird XML errors.

What's up with the latter? Is there a bug report?


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How can I preview the shadow configuration?

2012-03-20 Thread Rasto Levrinc
2012/3/20 Mars gu :
> Hi,
>     I want to excute the command ,the problem occurred:
>
> [root@h10_148 ~]# ptest
> -bash: ptest: command not found
>
> How can I preview the shadow configuration?

ptest has been replaced by crm_simulate. Also -L option in crm_simulate can
crash, so don't use that, also there can be wrong option in crm shell, so
you may see weird XML errors.

Rasto

-- 
Dipl.-Ing. Rastislav Levrinc
rasto.levr...@gmail.com
Linux Cluster Management Console
http://lcmc.sf.net/

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] How can I preview the shadow configuration?

2012-03-20 Thread Mars gu
Hi,
I want to excute the command ,the problem occurred:
 
[root@h10_148 ~]# ptest
-bash: ptest: command not found
 
How can I preview the shadow configuration?
 
[root@h10_148 ~]# rpm -q pacemaker
pacemaker-1.1.5-8.el6.x86_64
 
Thanks.
 
 ___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to setup STONITH in a 2-node active/passive linux HA pacemaker cluster?

2012-03-20 Thread Mathias Nestler
On 19.03.2012, at 20:26, Florian Haas wrote:

> On Mon, Mar 19, 2012 at 8:14 PM, Mathias Nestler
>  wrote:
>> Hi everyone,
>> 
>> I am trying to setup an active/passive (2 nodes) Linux-HA cluster with 
>> corosync and pacemaker to hold a PostgreSQL-Database up and running. It 
>> works via DRBD and a service-ip. If node1 fails, node2 should take over. The 
>> same if PG runs on node2 and it fails. Everything works fine except the 
>> STONITH thing.
>> 
>> Between the nodes is an dedicated HA-connection (10.10.10.X), so I have the 
>> following interface configuration:
>> 
>> eth0eth1   host
>> 10.10.10.251172.10.10.1 node1
>> 10.10.10.252172.10.10.2 node2
>> 
>> Stonith is enabled and I am testing with a ssh-agent to kill nodes.
>> 
>> crm configure property stonith-enabled=true
>> crm configure property stonith-action=poweroff
>> crm configure rsc_defaults resource-stickiness=100
>> crm configure property no-quorum-policy=ignore
>> 
>> crm configure primitive stonith_postgres stonith:external/ssh \
>>   params hostlist="node1 node2"
>> crm configure clone fencing_postgres stonith_postgres
> 
> You're missing location constraints, and doing this with 2 primitives
> rather than 1 clone is usually cleaner. The example below is for
> external/libvirt rather than external/ssh, but you ought to be able to
> apply the concept anyhow:
> 
> http://www.hastexo.com/resources/hints-and-kinks/fencing-virtual-cluster-nodes
> 

As is understood the cluster decides which node has to be stonith'ed. Besides 
this, I already tried the following configuration:

crm configure primitive stonith1_postgres stonith:ssh \
params hostlist="node1"
op monitor interval="25" timeout="10"
crm configure primitive stonith2_postgres stonith:ssh \
params hostlist="node2"
op monitor interval="25" timeout="10"
crm configure location stonith1_not_on_node1 stonith1_postgres \
-inf: node1
crm configure location stonith2_not_on_node2 stonith2_postgres \
-inf: node2

The result is the same :/


> Hope this helps.
> Cheers,
> Florian
> 

Best
Mathias

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org