[Pacemaker] Question about OCF RA reload behaviour

2014-10-28 Thread Felix Zachlod

Hello folks,


I just have one question about how a resource agent should behave on 
"reload" invoked if the resource is currently stopped. Should the 
resource be started or remain stopped? I did not find anything about 
that in the documentation, in gernel there are not mant informations 
about "reload".


I just noticed that my resource remained up after testing it with 
ocf-tester and this was because I implemented it the way that it would 
be started if reloaded in stopped state and ocf-tester invokes reload as 
last action. So I wondered if this was correct?



Thank you all in advance,
regards, Felix

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-10-28 Thread Sihan Goi
Hi,

No, I did not do this. I followed the Pacemaker 1.1 - Clusters from scratch
edition 5 for Fedora 13, and in section 7.3.4 it instructed me to run the
following commands, which I did:
mkfs.ext4 /dev/drbd1
mount /dev/drbd1 /mnt
create index.html file in /mnt
umount /dev/drbd1

Subsequently, after unmounting, there were no further instructions to mount
any other directories.

So, how should I mount /dev/mapper/vg_node02-drbd--demo to /var/www/html?
Should I be mounting /dev/mapper/vg_node02-drbd--demo, or /dev/drbd1. Since
I've already created index.html in /dev/drbd1, should I be mounting that?
I'm a little confused here.

On Tue, Oct 28, 2014 at 11:41 AM, Andrew Beekhof  wrote:

>
> > On 27 Oct 2014, at 6:05 pm, Sihan Goi  wrote:
> >
> > Hi,
> >
> > That offending line is as follows:
> > DocumentRoot "/var/www/html"
> >
> > I'm guessing it needs to be updated to the DRBD block device, but I'm
> not sure how to do that, or even what the block device is.
> >
> > fdisk -l shows the following, which I'm guessing is the block device?
> > /dev/mapper/vg_node02-drbd--demo
> >
> > lvs shows the following:
> > drbd-demo vg_node02 -wi-ao  1.00g
> >
> > btw I'm running the commands on node02 (secondary) rather than node01
> (primary). It's just a matter of convenience due to the physical location
> of the machine. Does it matter?
>
> Um, you need to mount /dev/mapper/vg_node02-drbd--demo to /var/www/html
> with a FileSystem resource.
> Have you not done this?
>
> >
> > Thanks.
> >
> > On Mon, Oct 27, 2014 at 11:35 AM, Andrew Beekhof 
> wrote:
> > Oct 27 10:28:44 node02 apache(WebSite)[10515]: ERROR: Syntax error on
> line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory
> >
> >
> >
> > > On 27 Oct 2014, at 1:36 pm, Sihan Goi  wrote:
> > >
> > > Hi Andrew,
> > >
> > > Logs in /var/log/httpd/ are empty, but here's a snippet of
> /var/log/messages right after I start pacemaker and do a "crm status"
> > >
> > > http://pastebin.com/ivQdyV4u
> > >
> > > Seems like the Apache service doesn't come up. This only happens after
> I run the commands in the guide to configure DRBD.
> > >
> > > On Fri, Oct 24, 2014 at 8:29 AM, Andrew Beekhof 
> wrote:
> > > logs?
> > >
> > > > On 23 Oct 2014, at 1:08 pm, Sihan Goi  wrote:
> > > >
> > > > Hi, can anyone help? Really stuck here...
> > > >
> > > > On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi 
> wrote:
> > > > Hi,
> > > >
> > > > I'm following the "Clusters from Scratch" guide for Fedora 13, and
> I've managed to get a 2 node cluster working with Apache. However, once I
> tried to add DRBD 8.4 to the mix, it stopped working.
> > > >
> > > > I've followed the DRBD steps in the guide all the way till "cib
> commit fs" in Section 7.4, right before "Testing Migration". However, when
> I do a crm_mon, I get the following "failed actions".
> > > >
> > > > Last updated: Thu Oct 16 17:28:34 2014
> > > > Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01
> > > > Stack: cman
> > > > Current DC: node02 - partition with quorum
> > > > Version: 1.1.10-14.el6_5.3-368c726
> > > > 2 Nodes configured
> > > > 5 Resources configured
> > > >
> > > >
> > > > Online: [ node01 node02 ]
> > > >
> > > > ClusterIP(ocf::heartbeat:IPaddr2):Started node02
> > > >  Master/Slave Set: WebDataClone [WebData]
> > > >  Masters: [ node02 ]
> > > >  Slaves: [ node01 ]
> > > > WebFS   (ocf::heartbeat:Filesystem):Started node02
> > > >
> > > > Failed actions:
> > > > WebSite_start_0 on node02 'unknown error' (1): call=278,
> status=Timed Out, last-rc-change='Thu Oct 16 17:26:28 2014',
> queued=2ms, exec=0ms
> > > > WebSite_start_0 on node01 'unknown error' (1): call=203,
> status=Timed
> > > > Out, last-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms,
> exec=0ms
> > > >
> > > > Seems like the apache Website resource isn't starting up. Apache was
> > > > working just fine before I configured DRBD. What did I do wrong?
> > > >
> > > > --
> > > > - Goi Sihan
> > > > gois...@gmail.com
> > > >
> > > >
> > > >
> > > > --
> > > > - Goi Sihan
> > > > gois...@gmail.com
> > > > ___
> > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > >
> > > > Project Home: http://www.clusterlabs.org
> > > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > Bugs: http://bugs.clusterlabs.org
> > >
> > >
> > > ___
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > >
> > >
> > >
> > > --
> > > - Goi Sihan
> > > gois...@gmail.com
> > > ___
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> 

Re: [Pacemaker] Master-slave master not promoted on Corosync restart

2014-10-28 Thread Sékine Coulibaly
Andrei, Andrew,

I'm afraid the machine is not available to me anymore, i'm sorry.

I reproduced the problem on my laptop. It seems that a failing to
start colocated resource made some othe other resources to be demoted
and stopped.
So this is most likely a non issue.

I'll monitor this issue and let you know.

Thank you

2014-10-28 5:24 GMT+01:00 Andrew Beekhof :
>
>> On 24 Oct 2014, at 9:00 pm, Sékine Coulibaly  wrote:
>>
>> Hi Andrew,
>>
>> Yep, forgot the attachments. I did reproduce the issue, please find
>> the bz2 files attached. Please tell if you need hb_report being used.
>
> Yep, I need the log files to put these into context
>
>>
>> Thank you !
>>
>>
>> 2014-10-07 5:07 GMT+02:00 Andrew Beekhof :
>>> I think you forgot the attachments (and my eyes are going blind trying to 
>>> read the word-wrapped logs :-)
>>>
>>> On 26 Sep 2014, at 6:37 pm, Sékine Coulibaly  wrote:
>>>
 Hi everyone,

 I'm trying my  best to diagnose a strange behaviour of my cluster.

 My cluster is basically a Master-Slave PostgreSQL cluster, with a VIP.
 Two nodes (clustera and clusterb). I'm running RHEL 6.5, Corosync
 1.4.1-1 and Pacemaker 1.1.10.

 For the simplicity sake of the diagnostic, I took of the slave node.

 My problem is that the cluster properly promotes the POSTGRESQL
 resource once (I issue a resource cleanup MS_POSTGRESQL to reset
 failcount counter, and then all resources are mounted on clustera).
 After a Corosync restart, the POSTGRESQL resource is not promoted.

 I narrowed down to the point where I add a location constraint
 (without this location constraint, after a Corosync restart,
 POSTGRESQL resource is promoted):

 location VIP_MGT_needs_gw VIP_MGT rule -inf: not_defined pingd or pingd 
 lte 0

 The logs show that the pingd attribute value is 1000 (the ping IP is
 pingable, and pinged [used tcpdump]). This attribute is set by :
 primitive ping_eth1_mgt_gw ocf:pacemaker:ping params
 host_list=178.3.1.47 multiplier=1000 op monitor interval=10s meta
 migration-threshold=3

 From corosync.log I can see :
 Sep 26 09:49:36 [22188] clusterapengine:   notice: LogActions:
 Start   POSTGRESQL:0(clustera)
 Sep 26 09:49:36 [22188] clusterapengine: info: LogActions:
 Leave   POSTGRESQL:1(Stopped)
 [...]
 Sep 26 09:49:36 [22186] clustera   lrmd: info: log_execute:
 executing - rsc:POSTGRESQL action:start call_id:20
 [...]
 Sep 26 09:49:37 [22187] clustera  attrd:   notice:
 attrd_trigger_update:Sending flush op to all hosts for:
 master-POSTGRESQL (50)
 [...]
 Sep 26 09:49:37 [22189] clustera   crmd: info:
 match_graph_event:   Action POSTGRESQL_notify_0 (46) confirmed on
 clustera (rc=0)
 [...]
 Sep 26 09:49:38 [22186] clustera   lrmd: info: log_finished:
 finished - rsc:ping_eth1_mgt_gw action:start call_id:22 pid:22352
 exit-code:0 exec-time:2175ms queue-time:0ms
 [...]
 Sep 26 09:49:38 [22188] clusterapengine: info: clone_print:
 Master/Slave Set: MS_POSTGRESQL [POSTGRESQL]
 Sep 26 09:49:38 [22188] clusterapengine: info: short_print:
 Slaves: [ clustera ]
 Sep 26 09:49:38 [22188] clusterapengine: info: short_print:
 Stopped: [ clusterb ]
 Sep 26 09:49:38 [22188] clusterapengine: info: native_print:
 VIP_MGT (ocf::heartbeat:IPaddr2):   Stopped
 Sep 26 09:49:38 [22188] clusterapengine: info: clone_print:
 Clone Set: cloned_ping_eth1_mgt_gw [ping_eth1_mgt_gw]
 Sep 26 09:49:38 [22188] clusterapengine: info: short_print:
 Started: [ clustera ]
 Sep 26 09:49:38 [22188] clusterapengine: info: short_print:
 Stopped: [ clusterb ]
 Sep 26 09:49:38 [22188] clusterapengine: info:
 rsc_merge_weights:   VIP_MGT: Rolling back scores from
 MS_POSTGRESQL
 Sep 26 09:49:38 [22188] clusterapengine: info: native_color:
 Resource VIP_MGT cannot run anywhere
 Sep 26 09:49:38 [22188] clusterapengine: info: native_color:
 POSTGRESQL:1: Rolling back scores from VIP_MGT
 Sep 26 09:49:38 [22188] clusterapengine: info: native_color:
 Resource POSTGRESQL:1 cannot run anywhere
 Sep 26 09:49:38 [22188] clusterapengine: info: master_color:
 MS_POSTGRESQL: Promoted 0 instances of a possible 1 to master
 Sep 26 09:49:38 [22188] clusterapengine: info: native_color:
 Resource ping_eth1_mgt_gw:1 cannot run anywhere
 Sep 26 09:49:38 [22188] clusterapengine: info: RecurringOp:
 Start recurring monitor (60s) for POSTGRESQL:0 on clustera
 Sep 26 09:49:38 [22188] clusterapengine: info: RecurringOp:
 Start recurring monitor (60s) for POSTGRESQL:0 on clustera
 Sep 26 09:49:38 [22188] clusterapengine: info: RecurringOp:
 Sta

Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-10-28 Thread Andrew Beekhof

> On 28 Oct 2014, at 6:26 pm, Sihan Goi  wrote:
> 
> Hi,
> 
> No, I did not do this. I followed the Pacemaker 1.1 - Clusters from scratch 
> edition 5 for Fedora 13, and in section 7.3.4 it instructed me to run the 
> following commands, which I did:
> mkfs.ext4 /dev/drbd1
> mount /dev/drbd1 /mnt
> create index.html file in /mnt
> umount /dev/drbd1
> 
> Subsequently, after unmounting, there were no further instructions to mount 
> any other directories.
> 
> So, how should I mount /dev/mapper/vg_node02-drbd--demo to /var/www/html? 
> Should I be mounting /dev/mapper/vg_node02-drbd--demo, or /dev/drbd1. Since 
> I've already created index.html in /dev/drbd1, should I be mounting that? I'm 
> a little confused here.

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_configure_the_cluster_for_drbd.html

Look for "Now that DRBD is functioning we can configure a Filesystem resource 
to use it"

> 
> On Tue, Oct 28, 2014 at 11:41 AM, Andrew Beekhof  wrote:
> 
> > On 27 Oct 2014, at 6:05 pm, Sihan Goi  wrote:
> >
> > Hi,
> >
> > That offending line is as follows:
> > DocumentRoot "/var/www/html"
> >
> > I'm guessing it needs to be updated to the DRBD block device, but I'm not 
> > sure how to do that, or even what the block device is.
> >
> > fdisk -l shows the following, which I'm guessing is the block device?
> > /dev/mapper/vg_node02-drbd--demo
> >
> > lvs shows the following:
> > drbd-demo vg_node02 -wi-ao  1.00g
> >
> > btw I'm running the commands on node02 (secondary) rather than node01 
> > (primary). It's just a matter of convenience due to the physical location 
> > of the machine. Does it matter?
> 
> Um, you need to mount /dev/mapper/vg_node02-drbd--demo to /var/www/html with 
> a FileSystem resource.
> Have you not done this?
> 
> >
> > Thanks.
> >
> > On Mon, Oct 27, 2014 at 11:35 AM, Andrew Beekhof  wrote:
> > Oct 27 10:28:44 node02 apache(WebSite)[10515]: ERROR: Syntax error on line 
> > 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory
> >
> >
> >
> > > On 27 Oct 2014, at 1:36 pm, Sihan Goi  wrote:
> > >
> > > Hi Andrew,
> > >
> > > Logs in /var/log/httpd/ are empty, but here's a snippet of 
> > > /var/log/messages right after I start pacemaker and do a "crm status"
> > >
> > > http://pastebin.com/ivQdyV4u
> > >
> > > Seems like the Apache service doesn't come up. This only happens after I 
> > > run the commands in the guide to configure DRBD.
> > >
> > > On Fri, Oct 24, 2014 at 8:29 AM, Andrew Beekhof  
> > > wrote:
> > > logs?
> > >
> > > > On 23 Oct 2014, at 1:08 pm, Sihan Goi  wrote:
> > > >
> > > > Hi, can anyone help? Really stuck here...
> > > >
> > > > On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi  wrote:
> > > > Hi,
> > > >
> > > > I'm following the "Clusters from Scratch" guide for Fedora 13, and I've 
> > > > managed to get a 2 node cluster working with Apache. However, once I 
> > > > tried to add DRBD 8.4 to the mix, it stopped working.
> > > >
> > > > I've followed the DRBD steps in the guide all the way till "cib commit 
> > > > fs" in Section 7.4, right before "Testing Migration". However, when I 
> > > > do a crm_mon, I get the following "failed actions".
> > > >
> > > > Last updated: Thu Oct 16 17:28:34 2014
> > > > Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01
> > > > Stack: cman
> > > > Current DC: node02 - partition with quorum
> > > > Version: 1.1.10-14.el6_5.3-368c726
> > > > 2 Nodes configured
> > > > 5 Resources configured
> > > >
> > > >
> > > > Online: [ node01 node02 ]
> > > >
> > > > ClusterIP(ocf::heartbeat:IPaddr2):Started node02
> > > >  Master/Slave Set: WebDataClone [WebData]
> > > >  Masters: [ node02 ]
> > > >  Slaves: [ node01 ]
> > > > WebFS   (ocf::heartbeat:Filesystem):Started node02
> > > >
> > > > Failed actions:
> > > > WebSite_start_0 on node02 'unknown error' (1): call=278, 
> > > > status=Timed Out, last-rc-change='Thu Oct 16 17:26:28 2014', 
> > > > queued=2ms, exec=0ms
> > > > WebSite_start_0 on node01 'unknown error' (1): call=203, 
> > > > status=Timed
> > > > Out, last-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms, exec=0ms
> > > >
> > > > Seems like the apache Website resource isn't starting up. Apache was
> > > > working just fine before I configured DRBD. What did I do wrong?
> > > >
> > > > --
> > > > - Goi Sihan
> > > > gois...@gmail.com
> > > >
> > > >
> > > >
> > > > --
> > > > - Goi Sihan
> > > > gois...@gmail.com
> > > > ___
> > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > >
> > > > Project Home: http://www.clusterlabs.org
> > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > Bugs: http://bugs.clusterlabs.org
> > >
> > >
> > > ___
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/

[Pacemaker] fencing with multiple node cluster

2014-10-28 Thread philipp . achmueller
hi,

any recommendation/documentation for a reliable fencing implementation on 
a multi-node cluster (4 or 6 nodes on 2 site). 
i think of implementing multiple node-fencing devices for each host to 
stonith remaining nodes on other site?

thank you!
Philipp
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] meta failure-timeout: crashed resource is assumed to be Started?

2014-10-28 Thread Carsten Otto
FYI: I cannot reproduce this problem right now. I guess I made a mistake
analyzing the logs.
-- 
andrena objects ag
Büro Frankfurt
Clemensstr. 8
60487 Frankfurt

Tel: +49 (0) 69 977 860 38
Fax: +49 (0) 69 977 860 39
http://www.andrena.de

Vorstand: Hagen Buchwald, Matthias Grund, Dr. Dieter Kuhn
Aufsichtsratsvorsitzender: Rolf Hetzelberger

Sitz der Gesellschaft: Karlsruhe
Amtsgericht Mannheim, HRB 109694
USt-IdNr. DE174314824

Bitte beachten Sie auch unsere anstehenden Veranstaltungen:
http://www.andrena.de/events


signature.asc
Description: Digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-10-28 Thread Sihan Goi
Hi,

I followed those steps previously. I just tried it again, but I'm still
getting the same error. My "crm configure show" shows the following:

node node01 \
attributes standby=off
node node02
primitive ClusterIP IPaddr2 \
params ip=192.168.1.110 cidr_netmask=24 \
op monitor interval=30s
primitive WebData ocf:linbit:drbd \
params drbd_resource=wwwdata \
op monitor interval=60s
primitive WebFS Filesystem \
params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html"
fstype=ext4
primitive WebSite apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op monitor interval=1min
ms WebDataClone WebData \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
location prefer-node01 WebSite 50: node01
colocation WebSite-with-WebFS inf: WebSite WebFS
colocation fs_on_drbd inf: WebFS WebDataClone:Master
colocation website-with-ip inf: WebSite ClusterIP
order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
order WebSite-after-WebFS inf: WebFS WebSite
order apache-after-ip Mandatory: ClusterIP WebSite
property cib-bootstrap-options: \
dc-version=1.1.10-14.el6_5.3-368c726 \
cluster-infrastructure=cman \
stonith-enabled=false \
no-quorum-policy=ignore
rsc_defaults rsc_defaults-options: \
migration-threshold=1

What am I doing wrong?

On Tue, Oct 28, 2014 at 5:11 PM, Andrew Beekhof  wrote:

>
> > On 28 Oct 2014, at 6:26 pm, Sihan Goi  wrote:
> >
> > Hi,
> >
> > No, I did not do this. I followed the Pacemaker 1.1 - Clusters from
> scratch edition 5 for Fedora 13, and in section 7.3.4 it instructed me to
> run the following commands, which I did:
> > mkfs.ext4 /dev/drbd1
> > mount /dev/drbd1 /mnt
> > create index.html file in /mnt
> > umount /dev/drbd1
> >
> > Subsequently, after unmounting, there were no further instructions to
> mount any other directories.
> >
> > So, how should I mount /dev/mapper/vg_node02-drbd--demo to
> /var/www/html? Should I be mounting /dev/mapper/vg_node02-drbd--demo, or
> /dev/drbd1. Since I've already created index.html in /dev/drbd1, should I
> be mounting that? I'm a little confused here.
>
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_configure_the_cluster_for_drbd.html
>
> Look for "Now that DRBD is functioning we can configure a Filesystem
> resource to use it"
>
> >
> > On Tue, Oct 28, 2014 at 11:41 AM, Andrew Beekhof 
> wrote:
> >
> > > On 27 Oct 2014, at 6:05 pm, Sihan Goi  wrote:
> > >
> > > Hi,
> > >
> > > That offending line is as follows:
> > > DocumentRoot "/var/www/html"
> > >
> > > I'm guessing it needs to be updated to the DRBD block device, but I'm
> not sure how to do that, or even what the block device is.
> > >
> > > fdisk -l shows the following, which I'm guessing is the block device?
> > > /dev/mapper/vg_node02-drbd--demo
> > >
> > > lvs shows the following:
> > > drbd-demo vg_node02 -wi-ao  1.00g
> > >
> > > btw I'm running the commands on node02 (secondary) rather than node01
> (primary). It's just a matter of convenience due to the physical location
> of the machine. Does it matter?
> >
> > Um, you need to mount /dev/mapper/vg_node02-drbd--demo to /var/www/html
> with a FileSystem resource.
> > Have you not done this?
> >
> > >
> > > Thanks.
> > >
> > > On Mon, Oct 27, 2014 at 11:35 AM, Andrew Beekhof 
> wrote:
> > > Oct 27 10:28:44 node02 apache(WebSite)[10515]: ERROR: Syntax error on
> line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory
> > >
> > >
> > >
> > > > On 27 Oct 2014, at 1:36 pm, Sihan Goi  wrote:
> > > >
> > > > Hi Andrew,
> > > >
> > > > Logs in /var/log/httpd/ are empty, but here's a snippet of
> /var/log/messages right after I start pacemaker and do a "crm status"
> > > >
> > > > http://pastebin.com/ivQdyV4u
> > > >
> > > > Seems like the Apache service doesn't come up. This only happens
> after I run the commands in the guide to configure DRBD.
> > > >
> > > > On Fri, Oct 24, 2014 at 8:29 AM, Andrew Beekhof 
> wrote:
> > > > logs?
> > > >
> > > > > On 23 Oct 2014, at 1:08 pm, Sihan Goi  wrote:
> > > > >
> > > > > Hi, can anyone help? Really stuck here...
> > > > >
> > > > > On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi 
> wrote:
> > > > > Hi,
> > > > >
> > > > > I'm following the "Clusters from Scratch" guide for Fedora 13, and
> I've managed to get a 2 node cluster working with Apache. However, once I
> tried to add DRBD 8.4 to the mix, it stopped working.
> > > > >
> > > > > I've followed the DRBD steps in the guide all the way till "cib
> commit fs" in Section 7.4, right before "Testing Migration". However, when
> I do a crm_mon, I get the following "failed actions".
> > > > >
> > > > > Last updated: Thu Oct 16 17:28:34 2014
> > > > > Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01
> > > > > Stack: cman
> > > > > Current DC: node02 - partition with quorum
> > > > > Version: 1.1.10-14.el6_5.3-368c726
> > > > > 2 Nodes con

Re: [Pacemaker] fencing with multiple node cluster

2014-10-28 Thread Digimer

On 28/10/14 05:59 AM, philipp.achmuel...@arz.at wrote:

hi,

any recommendation/documentation for a reliable fencing implementation
on a multi-node cluster (4 or 6 nodes on 2 site).
i think of implementing multiple node-fencing devices for each host to
stonith remaining nodes on other site?

thank you!
Philipp


Multi-site clustering is very hard to do well because of fencing issues. 
How do you distinguish a site failure from severed links? Given that a 
failed fence action can not be assumed to be a success, then the only 
safe option is to block until a human intervenes. This makes your 
cluster as reliable as your WAN between the sites, which is too say, not 
very reliable. In any case, the destruction of a site will require 
manual failover, which can be complicated if insufficient nodes remain 
to form quorum.


Generally, I'd recommend to different clusters, one per site, with 
manual/service-level failover in the case of a disaster.


In any case; A good fencing setup should have two fence methods. 
Personally, I always use IPMI as a primary fence method (routed through 
one switch) and a pair of switched PDUs as backup (via a backup switch). 
This way, when IPMI is available, a confirmed fence is 100% certain to 
be good. However, if the node is totally disabled/destroyed, IPMI will 
be lost and the cluster will switch to the switched PDUs, cutting the 
power outlets feeding the node.


I've got a block diagram of how I do this:

https://alteeve.ca/w/AN!Cluster_Tutorial_2#A_Map.21

It's trivial to scale the idea up to multiple node clusters.

Cheers

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 2 Node Clustering, when primary server goes down(shutdown) the secondary server restarts

2014-10-28 Thread Digimer

On 28/10/14 02:24 AM, kamal kishi wrote:

Hi,

  I know, no fencing configuration creates issue.
But the current scenario is due to fencing??


Maybe, maybe not. I can say that *not* having it will make solving the 
problem much more difficult. Please get it working, it's pretty easy and 
it will make your life a lot easier.



The syslog isn't revealing much about the same.
I would love to configure fencing but currently need some solution to
overcome the current scenario, if you say fencing is the only solution
then I might have to do it remotely.


It is critical, yes. Please add it, test it and then hook DRBD into it.


OS -> UBUNTU 12.04 (64 bits)
DRBD -> 8.3.11


That is quite old. Can you update to 8.3.16? Also, what version is 
pacemaker and corosync?



Thanks for the quick reply

On Tue, Oct 28, 2014 at 11:19 AM, Digimer mailto:li...@alteeve.ca>> wrote:

On 28/10/14 01:39 AM, kamal kishi wrote:

Hi all,

Facing a strange issue which I'm not able to resolve as
I'm not
sure where what is going wrong as the logs is not giving away
much to my
knowledge.

Issue -
Have configured 2 Node Clustering, have attached the configuration
file(New CRM conf of BIC.txt).

If Server2 which is primary is shutdown(forcefully by turning
off the
switch), Server1 restarts within few seconds and starts the
resources.
Even though the Server1 restarts and starts the resources the
time taken
to recover is too long to convince the clients and the current
working
is erroneous is what I feel.

Have attached the syslog with this mail.(syslog)

Do go through the same and let know a solution to resolve the
same as
the setup is in clients place.

--
Regards,
Kamal Kishore B V


You really need fencing, first and foremost. This will cause the
survivor to put the lost node into a known state and then safely
begin taking over lost services. Do your nodes have IPMI (or iRMC,
iLO, DRAC, etc)? If so, setting up stonith is easy.

Once it is setup, configure DRBD to use the fence-handler
'crm-fence-peer.sh' and change the fencing policy to
'resource-and-stonith'. Without this, you will get split-brains and
fail-over will be unpredictable.

Once stonith is configured and tested in pacemaker and you've hooked
DRBD's fencing into pacemaker, see if you problem remains. If it
does, on both nodes, run: 'tail -f -n 0 /var/log/messages', kill a
node and wait for things to settle down. Share the log output here.

Please also tell us your OS, pacemaker, drbd and corosync versions.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person
without access to education?

_
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org

http://oss.clusterlabs.org/__mailman/listinfo/pacemaker


Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org




--
Regards,
Kamal Kishore B V


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] fencing with multiple node cluster

2014-10-28 Thread Dejan Muhamedagic
Hi,

On Tue, Oct 28, 2014 at 09:51:02AM -0400, Digimer wrote:
> On 28/10/14 05:59 AM, philipp.achmuel...@arz.at wrote:
>> hi,
>>
>> any recommendation/documentation for a reliable fencing implementation
>> on a multi-node cluster (4 or 6 nodes on 2 site).
>> i think of implementing multiple node-fencing devices for each host to
>> stonith remaining nodes on other site?
>>
>> thank you!
>> Philipp
>
> Multi-site clustering is very hard to do well because of fencing issues.  
> How do you distinguish a site failure from severed links?

Indeed. There's a booth server managing the tickets in
pacemaker, which uses arbitrators to resolve ties. booth source
is available at github.com and packaged for several
distributions at OBS
(http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/)
It's also supported in the newly released SLE12.

Thanks,

Dejan

> Given that a  
> failed fence action can not be assumed to be a success, then the only  
> safe option is to block until a human intervenes. This makes your  
> cluster as reliable as your WAN between the sites, which is too say, not  
> very reliable. In any case, the destruction of a site will require  
> manual failover, which can be complicated if insufficient nodes remain  
> to form quorum.
>
> Generally, I'd recommend to different clusters, one per site, with  
> manual/service-level failover in the case of a disaster.
>
> In any case; A good fencing setup should have two fence methods.  
> Personally, I always use IPMI as a primary fence method (routed through  
> one switch) and a pair of switched PDUs as backup (via a backup switch).  
> This way, when IPMI is available, a confirmed fence is 100% certain to  
> be good. However, if the node is totally disabled/destroyed, IPMI will  
> be lost and the cluster will switch to the switched PDUs, cutting the  
> power outlets feeding the node.
>
> I've got a block diagram of how I do this:
>
> https://alteeve.ca/w/AN!Cluster_Tutorial_2#A_Map.21
>
> It's trivial to scale the idea up to multiple node clusters.
>
> Cheers
>
> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without  
> access to education?
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Antwort: Re: fencing with multiple node cluster

2014-10-28 Thread philipp . achmueller
hi,




Von:Dejan Muhamedagic 
An: The Pacemaker cluster resource manager 

Datum:  28.10.2014 16:45
Betreff:Re: [Pacemaker] fencing with multiple node cluster

>
>
>Hi,
>
>On Tue, Oct 28, 2014 at 09:51:02AM -0400, Digimer wrote:
>>> On 28/10/14 05:59 AM, philipp.achmuel...@arz.at wrote:
>>> hi,
>>>
>>> any recommendation/documentation for a reliable fencing implementation
>>> on a multi-node cluster (4 or 6 nodes on 2 site).
>>> i think of implementing multiple node-fencing devices for each host to
>>> stonith remaining nodes on other site?
>>>
>>> thank you!
>>> Philipp
>>
>> Multi-site clustering is very hard to do well because of fencing 
issues. 
>> How do you distinguish a site failure from severed links?
>
>Indeed. There's a booth server managing the tickets in
>pacemaker, which uses arbitrators to resolve ties. booth source
>is available at github.com and packaged for several
>distributions at OBS
>(
http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/)
>It's also supported in the newly released SLE12.
>
>Thanks,
>
>Dejan
>
hi,

@Digimer. thank you for explaination, but manual failover between sites 
isn't what i'm looking for.

@Dejan. Yes, i already tried a cluster(SLES11SP3) with booth setup. i used 
documentation from sleha11 SP3. 
but i'm afraid it is unclear for me how "fencing" with booth exactly works 
in case of some failures (loss-policy=fence). documentation says something 
like: ...to speed up recovery process nodes get fenced... do i need 
classic node-fencing(IPMI) when i configure booth setup? may you have some 
more information about that?

For correct setup, the arbitrator needs an adequate 3th location. site A 
and site B need separate connection to site C, otherwise some scenarios 
will fail.
any possibilities to get this running with 2 sites?

thank you!


>> Given that a 
>> failed fence action can not be assumed to be a success, then the only 
>> safe option is to block until a human intervenes. This makes your 
>> cluster as reliable as your WAN between the sites, which is too say, 
not 
>> very reliable. In any case, the destruction of a site will require 
>> manual failover, which can be complicated if insufficient nodes remain 
>> to form quorum.
>>
>> Generally, I'd recommend to different clusters, one per site, with 
>> manual/service-level failover in the case of a disaster.
>>
>> In any case; A good fencing setup should have two fence methods. 
>> Personally, I always use IPMI as a primary fence method (routed through 
 
>> one switch) and a pair of switched PDUs as backup (via a backup 
switch). 
>> This way, when IPMI is available, a confirmed fence is 100% certain to 
>> be good. However, if the node is totally disabled/destroyed, IPMI will 
>> be lost and the cluster will switch to the switched PDUs, cutting the 
>> power outlets feeding the node.
>>
>> I've got a block diagram of how I do this:
>>
>> https://alteeve.ca/w/AN!Cluster_Tutorial_2#A_Map.21
>>
>> It's trivial to scale the idea up to multiple node clusters.
>>
>> Cheers
>>
>> -- 
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without 
>> access to education?
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>___
>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker restart bringing up resource in start mode even other node is running the resource

2014-10-28 Thread Lax
Andrew Beekhof  writes:

> >>> So on pacemaker restart, is there any way I can stop my LSB resource
coming
> >>> up in START mode when such resource is already running on a master?
> >> 
> >> Tell init/systemd not to start it when the node boots 
> > 
> > Thanks for the response Andrew.
> > 
> > But even if I do not reboot the node and simply do pacemaker service stop
> > and start too I run into this issue.
> 
> 'it' referred to the LSB resource, not pacemaker/corosync
> 
Thanks for your help as always Andrew. Is there any sample that you can
point me to how to configure it for the resource.  

Thanks
Lax


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker restart bringing up resource in start mode even other node is running the resource

2014-10-28 Thread Andrew Beekhof

> On 29 Oct 2014, at 7:42 am, Lax  wrote:
> 
> Andrew Beekhof  writes:
> 
> So on pacemaker restart, is there any way I can stop my LSB resource
> coming
> up in START mode when such resource is already running on a master?
 
 Tell init/systemd not to start it when the node boots 
>>> 
>>> Thanks for the response Andrew.
>>> 
>>> But even if I do not reboot the node and simply do pacemaker service stop
>>> and start too I run into this issue.
>> 
>> 'it' referred to the LSB resource, not pacemaker/corosync
>> 
> Thanks for your help as always Andrew. Is there any sample that you can
> point me to how to configure it for the resource.  

I mean something like:

chkconfig {LSB-name} off


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] fencing with multiple node cluster

2014-10-28 Thread Andrew Beekhof

> On 28 Oct 2014, at 8:59 pm, philipp.achmuel...@arz.at wrote:
> 
> hi, 
> 
> any recommendation/documentation for a reliable fencing implementation on a 
> multi-node cluster (4 or 6 nodes on 2 site). 
> i think of implementing multiple node-fencing devices for each host to 
> stonith remaining nodes on other site? 

sbd might be a reasonable option
on rhel 7.1 you'll be able to combine it with no-quorum-policy=suicide to allow 
the site that retains quorum to continue knowing that the other side will have 
fenced itself


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker restart bringing up resource in start mode even other node is running the resource

2014-10-28 Thread Lax
Andrew Beekhof  writes:
> > Thanks for your help as always Andrew. Is there any sample that you can
> > point me to how to configure it for the resource.  
> 
> I mean something like:
> 
> chkconfig {LSB-name} off
> 
I tried setting it this way and still resource gets started on both the
nodes on pacemaker start. Here is the sequence I followed

1. Stop pacemaker service on both the peers Node1 and Node2
2. Issue 'chkconfig my-LSB-name off' on Node1 and Node2
3. Start pacemaker service first on Node1, wait till it fully comes up
- say 'crm_mon -1' to ensure Node1 is online and Node2 is Offline
4. Now Start pacemaker service on Node2
   - while it is happening I run 'crm_mon' on Node1 to see the resource
state transition. Here I still see my-lsb-resource being started on Node1
and Node2  -> Stopped -> started back on Node1

Thanks
Lax



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Announcing crmsh release 2.1.1

2014-10-28 Thread Kristoffer Grönlund

Today we are proud to announce the release of `crmsh` version 2.1.1!
This version primarily fixes all known issues found since the release
of `crmsh` 2.1 in June. We recommend that all users of crmsh upgrade
to this version, especially if using Pacemaker 1.1.12 or newer.

A massive thank you to everyone who has helped out with bug fixes,
comments and contributions for this release!

For a complete list of changes since the previous version, please
refer to the changelog:

* https://github.com/crmsh/crmsh/blob/2.1.1/ChangeLog

Packages for several popular Linux distributions can be downloaded
from the Stable repository at the OBS:

* http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/

Archives of the tagged release:

* https://github.com/crmsh/crmsh/archive/2.1.1.tar.gz
* https://github.com/crmsh/crmsh/archive/2.1.1.zip

Changes since the previous release:

 - cibconfig: Clean up output from crm_verify (bnc#893138)
 - high: constants: Add acl_target and acl_group to cib_cli_map (bnc#894041)
 - high: parse: split shortcuts into valid rules
 - medium: Handle broken CIB in find_objects
 - high: scripts: Handle corosync.conf without nodelist in add-node (bnc#862577)
 - medium: config: Assign default path in all cases
 - high: cibconfig: Generate valid CLI syntax for attribute lists (bnc#897462)
 - high: cibconfig: Add tag: to get all resources in tag
 - doc: Documentation for show tag:
 - low: report: Sort list of nodes
 - high: parse: Allow empty attribute values in nvpairs (bnc#898625)
 - high: cibconfig: Delay reinitialization after commit
 - low: cibconfig: Improve wording of commit prompt
 - low: cibconfig: Fix vim modeline
 - high: report: Find nodes for any log type (boo#900654)
 - high: hb_report: Collect logs from journald (boo#900654)
 - high: cibconfig: Don't crash if given an invalid pattern (bnc#901714)
 - high: xmlutil: Filter list of referenced resources (bnc#901714)
 - medium: ui_resource: Only act on resources (#64)
 - medium: ui_resource: Flatten, then filter (#64)
 - high: ui_resource: Use correct name for error function (bnc#901453)
 - high: ui_resource: resource trace failed if operation existed (bnc#901453)
 - Improved test suite

Thank you,

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker restart bringing up resource in start mode even other node is running the resource

2014-10-28 Thread Andrew Beekhof

> On 29 Oct 2014, at 9:30 am, Lax  wrote:
> 
> Andrew Beekhof  writes:
>>> Thanks for your help as always Andrew. Is there any sample that you can
>>> point me to how to configure it for the resource.  
>> 
>> I mean something like:
>> 
>>chkconfig {LSB-name} off
>> 
> I tried setting it this way and still resource gets started on both the
> nodes on pacemaker start. Here is the sequence I followed
> 
> 1. Stop pacemaker service on both the peers Node1 and Node2
> 2. Issue 'chkconfig my-LSB-name off' on Node1 and Node2
> 3. Start pacemaker service first on Node1, wait till it fully comes up
>- say 'crm_mon -1' to ensure Node1 is online and Node2 is Offline
> 4. Now Start pacemaker service on Node2
>   - while it is happening I run 'crm_mon' on Node1 to see the resource
> state transition. Here I still see my-lsb-resource being started on Node1
> and Node2  -> Stopped -> started back on Node1

crm_report please?

> 
> Thanks
> Lax
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Question about OCF RA reload behaviour

2014-10-28 Thread Andrew Beekhof

> On 28 Oct 2014, at 6:07 pm, Felix Zachlod  wrote:
> 
> Hello folks,
> 
> 
> I just have one question about how a resource agent should behave on "reload" 
> invoked if the resource is currently stopped. Should the resource be started 
> or remain stopped?

I don't think that is defined. Certainly pacemaker wont trigger that case 
unless the resource fails between the reload command and the monitor operation 
immediately preceding it.

> I did not find anything about that in the documentation, in gernel there are 
> not mant informations about "reload".
> 
> I just noticed that my resource remained up after testing it with ocf-tester 
> and this was because I implemented it the way that it would be started if 
> reloaded in stopped state and ocf-tester invokes reload as last action. So I 
> wondered if this was correct?

It seems reasonable to me

> 
> 
> Thank you all in advance,
> regards, Felix
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-10-28 Thread Andrew Beekhof
Can you run crm_report so we can see the logs and PE files?

> On 28 Oct 2014, at 9:16 pm, Sihan Goi  wrote:
> 
> Hi,
> 
> I followed those steps previously. I just tried it again, but I'm still 
> getting the same error. My "crm configure show" shows the following:
> 
> node node01 \
> attributes standby=off
> node node02
> primitive ClusterIP IPaddr2 \
> params ip=192.168.1.110 cidr_netmask=24 \
> op monitor interval=30s
> primitive WebData ocf:linbit:drbd \
> params drbd_resource=wwwdata \
> op monitor interval=60s
> primitive WebFS Filesystem \
> params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" 
> fstype=ext4
> primitive WebSite apache \
> params configfile="/etc/httpd/conf/httpd.conf" \
> op monitor interval=1min
> ms WebDataClone WebData \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 
> notify=true
> location prefer-node01 WebSite 50: node01
> colocation WebSite-with-WebFS inf: WebSite WebFS
> colocation fs_on_drbd inf: WebFS WebDataClone:Master
> colocation website-with-ip inf: WebSite ClusterIP
> order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
> order WebSite-after-WebFS inf: WebFS WebSite
> order apache-after-ip Mandatory: ClusterIP WebSite
> property cib-bootstrap-options: \
> dc-version=1.1.10-14.el6_5.3-368c726 \
> cluster-infrastructure=cman \
> stonith-enabled=false \
> no-quorum-policy=ignore
> rsc_defaults rsc_defaults-options: \
> migration-threshold=1
> 
> What am I doing wrong?
> 
> On Tue, Oct 28, 2014 at 5:11 PM, Andrew Beekhof  wrote:
> 
> > On 28 Oct 2014, at 6:26 pm, Sihan Goi  wrote:
> >
> > Hi,
> >
> > No, I did not do this. I followed the Pacemaker 1.1 - Clusters from scratch 
> > edition 5 for Fedora 13, and in section 7.3.4 it instructed me to run the 
> > following commands, which I did:
> > mkfs.ext4 /dev/drbd1
> > mount /dev/drbd1 /mnt
> > create index.html file in /mnt
> > umount /dev/drbd1
> >
> > Subsequently, after unmounting, there were no further instructions to mount 
> > any other directories.
> >
> > So, how should I mount /dev/mapper/vg_node02-drbd--demo to /var/www/html? 
> > Should I be mounting /dev/mapper/vg_node02-drbd--demo, or /dev/drbd1. Since 
> > I've already created index.html in /dev/drbd1, should I be mounting that? 
> > I'm a little confused here.
> 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_configure_the_cluster_for_drbd.html
> 
> Look for "Now that DRBD is functioning we can configure a Filesystem resource 
> to use it"
> 
> >
> > On Tue, Oct 28, 2014 at 11:41 AM, Andrew Beekhof  wrote:
> >
> > > On 27 Oct 2014, at 6:05 pm, Sihan Goi  wrote:
> > >
> > > Hi,
> > >
> > > That offending line is as follows:
> > > DocumentRoot "/var/www/html"
> > >
> > > I'm guessing it needs to be updated to the DRBD block device, but I'm not 
> > > sure how to do that, or even what the block device is.
> > >
> > > fdisk -l shows the following, which I'm guessing is the block device?
> > > /dev/mapper/vg_node02-drbd--demo
> > >
> > > lvs shows the following:
> > > drbd-demo vg_node02 -wi-ao  1.00g
> > >
> > > btw I'm running the commands on node02 (secondary) rather than node01 
> > > (primary). It's just a matter of convenience due to the physical location 
> > > of the machine. Does it matter?
> >
> > Um, you need to mount /dev/mapper/vg_node02-drbd--demo to /var/www/html 
> > with a FileSystem resource.
> > Have you not done this?
> >
> > >
> > > Thanks.
> > >
> > > On Mon, Oct 27, 2014 at 11:35 AM, Andrew Beekhof  
> > > wrote:
> > > Oct 27 10:28:44 node02 apache(WebSite)[10515]: ERROR: Syntax error on 
> > > line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory
> > >
> > >
> > >
> > > > On 27 Oct 2014, at 1:36 pm, Sihan Goi  wrote:
> > > >
> > > > Hi Andrew,
> > > >
> > > > Logs in /var/log/httpd/ are empty, but here's a snippet of 
> > > > /var/log/messages right after I start pacemaker and do a "crm status"
> > > >
> > > > http://pastebin.com/ivQdyV4u
> > > >
> > > > Seems like the Apache service doesn't come up. This only happens after 
> > > > I run the commands in the guide to configure DRBD.
> > > >
> > > > On Fri, Oct 24, 2014 at 8:29 AM, Andrew Beekhof  
> > > > wrote:
> > > > logs?
> > > >
> > > > > On 23 Oct 2014, at 1:08 pm, Sihan Goi  wrote:
> > > > >
> > > > > Hi, can anyone help? Really stuck here...
> > > > >
> > > > > On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi  wrote:
> > > > > Hi,
> > > > >
> > > > > I'm following the "Clusters from Scratch" guide for Fedora 13, and 
> > > > > I've managed to get a 2 node cluster working with Apache. However, 
> > > > > once I tried to add DRBD 8.4 to the mix, it stopped working.
> > > > >
> > > > > I've followed the DRBD steps in the guide all the way till "cib 
> > > > > commit fs" in Section 7.4, right before "Testing Migration". However, 
> > > > > when I do a crm_mon, I get the fol

Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-10-28 Thread Sihan Goi
Hi,

I've never used crm_report before. I just read the man file and generated a
tarball from 1-2 hours before I reconfigured all the DRBD related
resources. I've put the tarball here -
https://www.dropbox.com/s/suj9pttjp403msv/unexplained-apache-failure.tar.bz2?dl=0

Hope you can help figure out what I'm doing wrong. Thanks for the help!

On Wed, Oct 29, 2014 at 9:24 AM, Andrew Beekhof  wrote:

> Can you run crm_report so we can see the logs and PE files?
>
> > On 28 Oct 2014, at 9:16 pm, Sihan Goi  wrote:
> >
> > Hi,
> >
> > I followed those steps previously. I just tried it again, but I'm still
> getting the same error. My "crm configure show" shows the following:
> >
> > node node01 \
> > attributes standby=off
> > node node02
> > primitive ClusterIP IPaddr2 \
> > params ip=192.168.1.110 cidr_netmask=24 \
> > op monitor interval=30s
> > primitive WebData ocf:linbit:drbd \
> > params drbd_resource=wwwdata \
> > op monitor interval=60s
> > primitive WebFS Filesystem \
> > params device="/dev/drbd/by-res/wwwdata"
> directory="/var/www/html" fstype=ext4
> > primitive WebSite apache \
> > params configfile="/etc/httpd/conf/httpd.conf" \
> > op monitor interval=1min
> > ms WebDataClone WebData \
> > meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
> > location prefer-node01 WebSite 50: node01
> > colocation WebSite-with-WebFS inf: WebSite WebFS
> > colocation fs_on_drbd inf: WebFS WebDataClone:Master
> > colocation website-with-ip inf: WebSite ClusterIP
> > order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
> > order WebSite-after-WebFS inf: WebFS WebSite
> > order apache-after-ip Mandatory: ClusterIP WebSite
> > property cib-bootstrap-options: \
> > dc-version=1.1.10-14.el6_5.3-368c726 \
> > cluster-infrastructure=cman \
> > stonith-enabled=false \
> > no-quorum-policy=ignore
> > rsc_defaults rsc_defaults-options: \
> > migration-threshold=1
> >
> > What am I doing wrong?
> >
> > On Tue, Oct 28, 2014 at 5:11 PM, Andrew Beekhof 
> wrote:
> >
> > > On 28 Oct 2014, at 6:26 pm, Sihan Goi  wrote:
> > >
> > > Hi,
> > >
> > > No, I did not do this. I followed the Pacemaker 1.1 - Clusters from
> scratch edition 5 for Fedora 13, and in section 7.3.4 it instructed me to
> run the following commands, which I did:
> > > mkfs.ext4 /dev/drbd1
> > > mount /dev/drbd1 /mnt
> > > create index.html file in /mnt
> > > umount /dev/drbd1
> > >
> > > Subsequently, after unmounting, there were no further instructions to
> mount any other directories.
> > >
> > > So, how should I mount /dev/mapper/vg_node02-drbd--demo to
> /var/www/html? Should I be mounting /dev/mapper/vg_node02-drbd--demo, or
> /dev/drbd1. Since I've already created index.html in /dev/drbd1, should I
> be mounting that? I'm a little confused here.
> >
> >
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_configure_the_cluster_for_drbd.html
> >
> > Look for "Now that DRBD is functioning we can configure a Filesystem
> resource to use it"
> >
> > >
> > > On Tue, Oct 28, 2014 at 11:41 AM, Andrew Beekhof 
> wrote:
> > >
> > > > On 27 Oct 2014, at 6:05 pm, Sihan Goi  wrote:
> > > >
> > > > Hi,
> > > >
> > > > That offending line is as follows:
> > > > DocumentRoot "/var/www/html"
> > > >
> > > > I'm guessing it needs to be updated to the DRBD block device, but
> I'm not sure how to do that, or even what the block device is.
> > > >
> > > > fdisk -l shows the following, which I'm guessing is the block device?
> > > > /dev/mapper/vg_node02-drbd--demo
> > > >
> > > > lvs shows the following:
> > > > drbd-demo vg_node02 -wi-ao  1.00g
> > > >
> > > > btw I'm running the commands on node02 (secondary) rather than
> node01 (primary). It's just a matter of convenience due to the physical
> location of the machine. Does it matter?
> > >
> > > Um, you need to mount /dev/mapper/vg_node02-drbd--demo to
> /var/www/html with a FileSystem resource.
> > > Have you not done this?
> > >
> > > >
> > > > Thanks.
> > > >
> > > > On Mon, Oct 27, 2014 at 11:35 AM, Andrew Beekhof 
> wrote:
> > > > Oct 27 10:28:44 node02 apache(WebSite)[10515]: ERROR: Syntax error
> on line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory
> > > >
> > > >
> > > >
> > > > > On 27 Oct 2014, at 1:36 pm, Sihan Goi  wrote:
> > > > >
> > > > > Hi Andrew,
> > > > >
> > > > > Logs in /var/log/httpd/ are empty, but here's a snippet of
> /var/log/messages right after I start pacemaker and do a "crm status"
> > > > >
> > > > > http://pastebin.com/ivQdyV4u
> > > > >
> > > > > Seems like the Apache service doesn't come up. This only happens
> after I run the commands in the guide to configure DRBD.
> > > > >
> > > > > On Fri, Oct 24, 2014 at 8:29 AM, Andrew Beekhof <
> and...@beekhof.net> wrote:
> > > > > logs?
> > > > >
> > > > > > On 23 Oct 2014, at 1:08 pm, Sihan Goi  wrote:
> > > > > >
> > > > > > Hi, can anyon

Re: [Pacemaker] Question about OCF RA reload behaviour

2014-10-28 Thread Felix Zachlod

Am 29.10.2014 01:39, schrieb Andrew Beekhof:

I just have one question about how a resource agent should behave on "reload" 
invoked if the resource is currently stopped. Should the resource be started or remain 
stopped?


I don't think that is defined. Certainly pacemaker wont trigger that case 
unless the resource fails between the reload command and the monitor operation 
immediately preceding it.


I did not find anything about that in the documentation, in gernel there are not mant 
informations about "reload".

I just noticed that my resource remained up after testing it with ocf-tester 
and this was because I implemented it the way that it would be started if 
reloaded in stopped state and ocf-tester invokes reload as last action. So I 
wondered if this was correct?


It seems reasonable to me


Thanks, Andrew, for making this clear.

regards, Felix

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org