Re: [Pacemaker] Unable to stop Multi state resource

2011-04-19 Thread Andrew Beekhof
On Tue, Apr 19, 2011 at 12:34 PM, Rakesh K  wrote:
> Rakesh K  writes:
>
>
> Hi Andrew
>
> FSR is a File system replication script which adheres to ocf cluster frame 
> work,
> the script is similar to Mysql ocf script, which is a multi state resource,
> where in master  ssh server would be running and in slave there are rsync
> scripts which uses to synchronize the data between the Master and slave.
>
> the rsync script will be having the Master FSR location, so that the rysnc 
> tool
> will be frequently replication the data from the FSR master location.
>
> here is the crm configuration show output

Thanks, but this doesn't really answer my question about whether the
cluster tried to stop it.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Ordering set of resources, problem in ordering chain of resources

2011-04-19 Thread Andrew Beekhof
On Tue, Apr 19, 2011 at 12:40 PM, Rakesh K  wrote:
> Andrew Beekhof  writes:
>
>>
>> There is nothing in this config that requires tomcat2 to be stopped.
>>
>> Perhaps:
>>    colocation Tomcat2-with-Tomcat inf: Tomcat1 Tomcat2VIP
>> was intended to be:
>>    colocation Tomcat2-with-Tomcat inf: Tomcat2 Tomcat1
>>
>> The only other service active is httpd, which also has no constraints
>> indicating it should stop when mysql is down.
>>
>
> Thanks Andrew for the valuable feed back.
>
> As mentioned i had changed the colocation constraint but still facing with the
> same issue.
>
> As per the order given in HA configuration, i am providing output of my crm
> configure show command

Not enough sorry, I need the status section too.
   crm configure show xml

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] crm : unknown expected votes

2011-04-19 Thread Andrew Beekhof
On Tue, Apr 19, 2011 at 3:37 PM,   wrote:
> Hi,
>
>
>
>     I created a 2 node cluster created using pacemaker on Fedora
> 14(2.6.35.6-45.fc14.x86_64)
>
>     I have two errors that I am not able to resolve.
>
>     Can someone help me resolve these errors.
>
>
>
>   1 )  It always shows “ unknown expected votes”  when I see ‘crm status’.

not an error. heartbeat based clusters do not use this

>
>   2 ) In the logfile it shows  below message even though stonith setting is
> not enabled.
>
>     Error: te_connect_stonith: Attempting connection to fencing daemon…

disabling stonith does not impact whether the daemon is started nor
whether we connect to it

google is your friend:
   http://www.mail-archive.com/linux-ha@lists.linux-ha.org/msg16967.html


>
>
>
>
>  Pasted below the  configure and status:
>
> ===
>
> -bash-4.1# crm configure show
>
> node $id="2e9dd3fa-8083-4363-96b4-331aa9b93d1f" rabbithanode2
>
> node $id="3a56dae9-d8c7-46b0-8a86-f6bd3b9658f4" rabbithanode1
>
> primitive bunny ocf:rabbitmq:rabbitmq-server \
>
>     params mnesia_base="/cluster1"
>
> primitive drbd ocf:linbit:drbd \
>
>     params drbd_resource="wwwdata" \
>
>     op monitor interval="60s"
>
> primitive drbd_fs ocf:heartbeat:Filesystem \
>
>     params device="/dev/drbd1" directory="/cluster1" fstype="ext4"
>
> ms drbd_ms drbd \
>
>     meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
>
> colocation bunny_on_fs inf: bunny drbd_fs
>
> colocation fs_on_drbd inf: drbd_fs drbd_ms:Master
>
> order bunny_after_fs inf: drbd_fs bunny
>
> order fs_after_drbd inf: drbd_ms:promote drbd_fs:start
>
> property $id="cib-bootstrap-options" \
>
>     dc-version="1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065" \
>
>     cluster-infrastructure="Heartbeat" \
>
>     stonith-enabled="false" \
>
>     resource-stickiness="100" \
>
>     no-quorum-policy="ignore"
>
> ===
>
>
>
> -bash-4.1# crm status
>
> 
>
> Last updated: Tue Apr 19 09:32:52 2011
>
> Stack: Heartbeat
>
> Current DC: rabbithanode2 (2e9dd3fa-8083-4363-96b4-331aa9b93d1f) - partition
> with quorum
>
> Version: 1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065
>
> 2 Nodes configured, unknown expected votes
>
> 3 Resources configured.
>
> 
>
>
>
> Online: [ rabbithanode1 rabbithanode2 ]
>
>
>
>  Master/Slave Set: drbd_ms [drbd]
>
>  Masters: [ rabbithanode1 ]
>
>  Slaves: [ rabbithanode2 ]
>
>  drbd_fs    (ocf::heartbeat:Filesystem):    Started rabbithanode1
>
>  bunny  (ocf::rabbitmq:rabbitmq-server):    Started rabbithanode1
>
> -bash-4.1#
>
> ===
>
>
>
> Thanks & Regds
>
> Hari Tatituri
>
>
>
> TACG-Cloud Factory Mobilization
>
> Desk  : +91-080-43154146
>
> Mobile: +91-9686022660
>
>
>
> 
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise private information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the email by you is prohibited.
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Question of the syslog output in pacemaker-1.1

2011-04-19 Thread Yuusuke IIDA

Hi, Andrew

(2011/04/19 18:13), Andrew Beekhof wrote:

On Tue, Apr 19, 2011 at 9:25 AM, Yuusuke IIDA
  wrote:

Hi, Andrew

I use corosync-1.3.0 and Pacemaker-1.1.5.

The log outputs it via rsyslog.

I changed syslog_facility of corosync.conf to local1 and was going to let a
designated file output the log of the cluster.

However, setting was not reflected for a process performed of fork by 
pacemakerd.


Ah, I see the problem.
The following patch will be in devel shortly

diff -r 7225f68ae6e9 mcp/pacemaker.c
--- a/mcp/pacemaker.c   Mon Apr 18 16:52:22 2011 +0200
+++ b/mcp/pacemaker.c   Tue Apr 19 11:12:09 2011 +0200
@@ -692,7 +692,7 @@ main(int argc, char **argv)
crm_make_daemon(crm_system_name, TRUE, pid_file);

/* Only Re-init if we're running daemonized */
-   crm_log_init_quiet(NULL, LOG_INFO, TRUE, FALSE, argc, argv);
+   crm_log_init(NULL, LOG_INFO, TRUE, FALSE, 0, NULL);
  }

  crm_info("Starting Pacemaker %s (Build: %s): %s\n", VERSION,
BUILD_VERSION, CRM_FEATURES);


I confirmed that a problem was solved by this correction.
http://hg.clusterlabs.org/pacemaker/devel/rev/b162c2b84e16

Give a quick response, and thank you.
Yuusuke





facility of a process performed of fork by pacemakerd remained daemon.

It was only corosync and pacemakerd that the setting that I changed became
effective.

Why is setting of syslog_facility of corosync.conf ineffective in a process
performed of fork by pacemakerd?

Please teach a method to change facility of a process performed of fork by
pacemakerd.

Best Regards,
Yuusuke IIDA
--

METRO SYSTEMS CO., LTD

Yuusuke Iida
Mail: iiday...@intellilink.co.jp


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



--

METRO SYSTEMS CO., LTD

Yuusuke Iida
Mail: iiday...@intellilink.co.jp


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Resources won't start

2011-04-19 Thread Phil Hunt
>>Did it start?

No, here is the output, all resources kind of went away.  Thats what I've been 
fighting all day..


Last updated: Tue Apr 19 13:52:18 2011
Stack: openais
Current DC: CentClus2 - partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
1 Resources configured.


Online: [ CentClus1 CentClus2 ]




Thats it!

Config:
node CentClus1
node CentClus2
primitive FS_disk ocf:heartbeat:Filesystem \
params device="/dev/VolGroup01/Shared1" directory="/data" fstype="gfs"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.1.90" cidr_netmask="32" \
op monitor interval="30s"
primitive ISCSI_disk ocf:heartbeat:iscsi \
params portal="192.168.1.79:3260" \
target="iqn.1991-05.com.microsoft:wss-w2xxxvr-wss-target3-target" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="120s" timeout="30s"
primitive VG_disk ocf:heartbeat:LVM \
params volgrpname="VolGroup01" exclusive="yes" \
op monitor interval="10" timeout="30" on-fail="restart" depth="0" \
op start interval="0" timeout="30" \
op stop interval="0" timeout="30"
primitive IP_ping ocf:heartbeat:IPaddr2 \
params ip="192.168.1.90" cidr_netmask="32" \
op monitor interval="30s"
primitive PM_ping ocf:pacemaker:ping \
params name="p_ping" host_list="192.168.1.91 192.168.1.92 192.168.1.1 " 
\
op monitor interval="15s" timeout="30s"
property $id="cib-bootstrap-options" \
dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
group CL_group ClusterIP ISCSI_disk VG_disk FS_disk PM_ping IP_ping
location Loc_ping CL_group \
rule $id="loc_ping-rule" -inf: not_defined PM_ping or PM_ping lte 0
rsc_defaults $id="rsc-options" \
resource-stickiness="1000"












PHIL HUNT AMS Consultant 
phil.h...@orionhealth.com 
P: +1 857 488 4749 
M: +1 508 654 7371 
S: philhu0724 
www.orionhealth.com 

- Original Message -
From: "mark - pacemaker list" 
To: "The Pacemaker cluster resource manager" 
Sent: Tuesday, April 19, 2011 5:05:16 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Pacemaker] Resources won't start

Hi Phil,

On Tue, Apr 19, 2011 at 3:36 PM, Phil Hunt  wrote:
> Hi
> I have iscsid running, no iscsi.

Good.   You don't want the system to auto-connect the iSCSI disks on
boot, pacemaker will do that for you.

>
>
>
> Here is the crm status:
> 
> Last updated: Tue Apr 19 12:39:03 2011
> Stack: openais
> Current DC: CentClus2 - partition with quorum
> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
> 2 Nodes configured, 2 expected votes
> 1 Resources configured.
> 
>
> Online: [ CentClus1 CentClus2 ]
>
>  Resource Group: CL_group
>     ClusterIP  (ocf::heartbeat:IPaddr2):       Started CentClus2
>     FS_disk    (ocf::heartbeat:Filesystem):    Stopped
>     ISCSI_disk (ocf::heartbeat:iscsi): Stopped
>     VG_disk    (ocf::heartbeat:LVM):   Stopped
>     PM_ping    (ocf::pacemaker:ping):  Stopped
>     IP_ping    (ocf::heartbeat:IPaddr2):       Stopped
>
>

The resources are listed in top-down start order.  So you're starting
ClusterIP, but then try to start the filesystem when you still haven't
connected to the iSCSI disk or started the volume group.

>
> Here is the crm config:
> node CentClus1

> group CL_group ClusterIP FS_disk ISCSI_disk VG_disk PM_ping IP_ping

> order ISCSI_startup inf: ISCSI_disk VG_disk FS_disk

You have conflicting orders, there.  A resource group defines an
order, so the other order statement seems unnecessary.  If you remove
that order constraint, and change your group line so that things start
in the correct order, does it come up?

group CL_group ClusterIP ISCSI_disk VG_disk FS_disk PM_ping

I left off IP_ping, because it's exactly the same as ClusterIP.  Was
it meant to be something else?

Regards,
Mark

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Resources won't start

2011-04-19 Thread mark - pacemaker list
Hi Phil,

On Tue, Apr 19, 2011 at 3:36 PM, Phil Hunt  wrote:
> Hi
> I have iscsid running, no iscsi.

Good.   You don't want the system to auto-connect the iSCSI disks on
boot, pacemaker will do that for you.

>
>
>
> Here is the crm status:
> 
> Last updated: Tue Apr 19 12:39:03 2011
> Stack: openais
> Current DC: CentClus2 - partition with quorum
> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
> 2 Nodes configured, 2 expected votes
> 1 Resources configured.
> 
>
> Online: [ CentClus1 CentClus2 ]
>
>  Resource Group: CL_group
>     ClusterIP  (ocf::heartbeat:IPaddr2):       Started CentClus2
>     FS_disk    (ocf::heartbeat:Filesystem):    Stopped
>     ISCSI_disk (ocf::heartbeat:iscsi): Stopped
>     VG_disk    (ocf::heartbeat:LVM):   Stopped
>     PM_ping    (ocf::pacemaker:ping):  Stopped
>     IP_ping    (ocf::heartbeat:IPaddr2):       Stopped
>
>

The resources are listed in top-down start order.  So you're starting
ClusterIP, but then try to start the filesystem when you still haven't
connected to the iSCSI disk or started the volume group.

>
> Here is the crm config:
> node CentClus1

> group CL_group ClusterIP FS_disk ISCSI_disk VG_disk PM_ping IP_ping

> order ISCSI_startup inf: ISCSI_disk VG_disk FS_disk

You have conflicting orders, there.  A resource group defines an
order, so the other order statement seems unnecessary.  If you remove
that order constraint, and change your group line so that things start
in the correct order, does it come up?

group CL_group ClusterIP ISCSI_disk VG_disk FS_disk PM_ping

I left off IP_ping, because it's exactly the same as ClusterIP.  Was
it meant to be something else?

Regards,
Mark

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Resources won't start

2011-04-19 Thread Phil Hunt
Hi

I've been having alot of problems figuring out a problem.  In the enclosed 
config for a 2 node cluster, letting 2 RHEL5 boxes work as a cluster with a 
shared iSCSI disk stored on a Windows Storage Server box, the resources will 
not start.

I have iscsid running, no iscsi.  I was modifying because the iscsi would 
connect to the target, the lvm would work but the disk would not mount.  If I 
mounted it manually and did a resource cleanup on the FS resource, it said it 
was fine.  Something I did really messed it all up.  I am very new at this, so 
anything with this problem in my config would be appreciated.  The attempt is 
to have the 2 systems share a iscsi disk, all resources running in one group 
with a VIP address and the disk moving to the other server on failure.

  

Here is the crm status:

Last updated: Tue Apr 19 12:39:03 2011
Stack: openais
Current DC: CentClus2 - partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
1 Resources configured.


Online: [ CentClus1 CentClus2 ]

 Resource Group: CL_group
 ClusterIP  (ocf::heartbeat:IPaddr2):   Started CentClus2
 FS_disk(ocf::heartbeat:Filesystem):Stopped
 ISCSI_disk (ocf::heartbeat:iscsi): Stopped
 VG_disk(ocf::heartbeat:LVM):   Stopped
 PM_ping(ocf::pacemaker:ping):  Stopped
 IP_ping(ocf::heartbeat:IPaddr2):   Stopped



Here is the crm config:
node CentClus1
node CentClus2
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.1.90" cidr_netmask="32" \
op monitor interval="30s"
primitive FS_disk ocf:heartbeat:Filesystem \
params device="/dev/VolGroup01/Shared1" directory="/data" fstype="gfs"
primitive IP_ping ocf:heartbeat:IPaddr2 \
params ip="192.168.1.90" cidr_netmask="32" \
op monitor interval="30s"
primitive ISCSI_disk ocf:heartbeat:iscsi \
params portal="192.168.1.79:3260" 
target="iqn.1991-05.com.microsoft:wss-w2xxsvr-wss-target3-target" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="120s" timeout="30s"
primitive PM_ping ocf:pacemaker:ping \
params name="p_ping" host_list="192.168.1.91 192.168.1.92 192.168.1.1 " 
\
op monitor interval="15s" timeout="30s"
primitive VG_disk ocf:heartbeat:LVM \
params volgrpname="VolGroup01" exclusive="yes" \
op monitor interval="10" timeout="30" on-fail="restart" depth="0" \
op start interval="0" timeout="30" \
op stop interval="0" timeout="30"
group CL_group ClusterIP FS_disk ISCSI_disk VG_disk PM_ping IP_ping
location Loc_ping CL_group \
rule $id="loc_ping-rule" -inf: not_defined PM_ping or PM_ping lte 0
order ISCSI_startup inf: ISCSI_disk VG_disk FS_disk
property $id="cib-bootstrap-options" \
dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="1000"



PHIL HUNT AMS Consultant 
phil.h...@orionhealth.com 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] mysql m/s failover: 'Could not find first log file name in binary log index file'

2011-04-19 Thread Raoul Bhatia [IPAX]
On 04/19/2011 10:38 AM, Marek Marczykowski wrote:
>> in your opintion, is it possible to fix this via the ocf ra or does it
>> have to be a separate cronjob?
> 
> I haven't idea how to do it in ra. There is no easy way to look what
> binlogs are on the other node. Maybe some tricks storing that info on
> monitor action, but this is ugly and makes ra depending on monitor
> action enabled...
> The easiest solutions are the best :)

what about submitting a "show master logs" query to the to-be master,
checking the available logs and refusing to start if the log-file
disappeared?

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Resource Agents 1.0.4: HA LVM Patch

2011-04-19 Thread Ulf
Hi,

I attached a patch to enhance the LVM agent with the capability to set a tag on 
the VG (set_hosttag = true) in conjunction with a volume_list filter this can 
prevent to activate a VG on multiple host. Unfortunately active VGs will stay 
active in case of unclean operation.
The tag is always the hostname.
Some configuration hints can be found here: 
http://sources.redhat.com/cluster/wiki/LVMFailover

Cheers,
Ulf
-- 
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit 
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl


LVM.patch
Description: Binary data
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] crm : unknown expected votes

2011-04-19 Thread hari.n.tatituri
Hi,

I created a 2 node cluster created using pacemaker on Fedora 
14(2.6.35.6-45.fc14.x86_64)
I have two errors that I am not able to resolve.
Can someone help me resolve these errors.

  1 )  It always shows " unknown expected votes"  when I see 'crm status'.
  2 ) In the logfile it shows  below message even though stonith setting is not 
enabled.
Error: te_connect_stonith: Attempting connection to fencing daemon...


 Pasted below the  configure and status:
===
-bash-4.1# crm configure show
node $id="2e9dd3fa-8083-4363-96b4-331aa9b93d1f" rabbithanode2
node $id="3a56dae9-d8c7-46b0-8a86-f6bd3b9658f4" rabbithanode1
primitive bunny ocf:rabbitmq:rabbitmq-server \
params mnesia_base="/cluster1"
primitive drbd ocf:linbit:drbd \
params drbd_resource="wwwdata" \
op monitor interval="60s"
primitive drbd_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/cluster1" fstype="ext4"
ms drbd_ms drbd \
meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true"
colocation bunny_on_fs inf: bunny drbd_fs
colocation fs_on_drbd inf: drbd_fs drbd_ms:Master
order bunny_after_fs inf: drbd_fs bunny
order fs_after_drbd inf: drbd_ms:promote drbd_fs:start
property $id="cib-bootstrap-options" \
dc-version="1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
resource-stickiness="100" \
no-quorum-policy="ignore"
===

-bash-4.1# crm status

Last updated: Tue Apr 19 09:32:52 2011
Stack: Heartbeat
Current DC: rabbithanode2 (2e9dd3fa-8083-4363-96b4-331aa9b93d1f) - partition 
with quorum
Version: 1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065
2 Nodes configured, unknown expected votes
3 Resources configured.


Online: [ rabbithanode1 rabbithanode2 ]

 Master/Slave Set: drbd_ms [drbd]
 Masters: [ rabbithanode1 ]
 Slaves: [ rabbithanode2 ]
 drbd_fs(ocf::heartbeat:Filesystem):Started rabbithanode1
 bunny  (ocf::rabbitmq:rabbitmq-server):Started rabbithanode1
-bash-4.1#
===

Thanks & Regds
Hari Tatituri

TACG-Cloud Factory Mobilization
Desk  : +91-080-43154146
Mobile: +91-9686022660



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the email by you is prohibited.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] how to get pacemaker:ping recheck before promoting drbd resources on a node

2011-04-19 Thread Andrew Beekhof
On Tue, Apr 19, 2011 at 11:54 AM, Jelle de Jong
 wrote:
> On 19-04-11 11:31, Andrew Beekhof wrote:
>> It the underlying messaging/membership layer goes into spasms -
>> there's not much ping can do to help you. What version of corosync
>> have you got?  Some versions have been better than others.
>
> corosync 1.2.1-4
> pacemaker 1.0.9.1+hg15626-1
> /etc/debian_version 6.0.1 (stable)
>
>> Correct, its checked periodically.
>
> Can I change the config that a ping check is done before promoting drbd?

No. As I said, you'd need to add this to the agent itself.
We just make sure things are in a certain state before
starting/promoting other resources - we don't call specific actions.

>
> I tried adding a seperate ping0: http://pastebin.com/raw.php?i=2WD1HKnC
> I thought it worked but ping0 starts and drbd is still promoted probably
> because ping0 returns a successful start but does not return an error
> because the actual ping failed. So I tried adding additonal location
> rules for ping0 but then the resources is not started at anymore:
> http://pastebin.com/raw.php?i=DXqRzMNs
>
>> That is something that would be needed to be added to the drbd
>> agent. Alternatively, configure the ping resource to update more
>> frequently.
>
> How can this be done? crm ra info ocf:ping doesn't show much info. I
> tried using attempts="1" dampen="1" timeout="1" and monitor
> interval="1". An example how to do frequent fast ping would be welcome.

a monitor with interval=1, timeout=1 and dampen=0 should give the
closest behavior to what you're after.
make sure interval is not a parameter though.

>
> If I cam make the ping check fast enough to detect network failures
> before corosync tell pacemaker the other node disappears/failed this may
> provide a workaround solution.
>
>> But you did loose the node. The cluster can't see into the future to
>> know that it will come back in a bit. What token timeouts are you
>> using?
>
> True, but the node should see his own network is down and see he is the
> one that was failing and wait until his network is back and check his
> situation again before doing things with his resources.

The cluster does not understand the network topology in the way you do

> My corosync.conf with token 3000: http://pastebin.com/Y5Lkf4Ch

Increasing that will tell the cluster to wait a bit longer before
declaring a node dead.

>
> Thanks in advance,
>
> Any help is much appreciated,
>
> Kind regards,
>
> Jelle de Jong
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] A question and demand to a resource placement strategy function

2011-04-19 Thread Yan Gao
On 04/18/11 18:17, Yuusuke IIDA wrote:
> * When it is not dispersed well
> When I produced trouble in a resource in order of next, I am partial, and the
> resource is placed in one node.
> 
> main_rsc3 ->  main_rsc2 ->  main_rsc1
> 
> Online: [srv-b1 srv-b2 srv-a1]
> Full list of resources:
> main_rsc1 (ocf::pacemaker:Dummy): Started srv-b1
> main_rsc2 (ocf::pacemaker:Dummy): Started srv-b1
> main_rsc3 (ocf::pacemaker:Dummy): Started srv-b1
> 
> # crm configure ptest utilization
> Utilization information:
> Original: srv-b2 capacity: capacity=3
> Original: srv-b1 capacity: capacity=3
> Original: srv-a1 capacity: capacity=3
> calculate_utilization: main_rsc1 utilization on srv-b1: capacity=1
> calculate_utilization: main_rsc2 utilization on srv-b1: capacity=1
> calculate_utilization: main_rsc3 utilization on srv-b1: capacity=1
> Remaining: srv-b2 capacity: capacity=3
> Remaining: srv-b1 capacity: capacity=0
> Remaining: srv-a1 capacity: capacity=3
> 
> I think that this problem occurs by difference in order of handling of 
> resource.
Exactly. Given the allocation scores are as the following at this time:

native_color: main_rsc1 allocation score on srv-a1: -INFINITY
native_color: main_rsc1 allocation score on srv-b1: 100
native_color: main_rsc1 allocation score on srv-b2: 100
native_color: main_rsc2 allocation score on srv-a1: -INFINITY
native_color: main_rsc2 allocation score on srv-b1: INFINITY
native_color: main_rsc2 allocation score on srv-b2: 100
native_color: main_rsc3 allocation score on srv-a1: -INFINITY
native_color: main_rsc3 allocation score on srv-b1: INFINITY
native_color: main_rsc3 allocation score on srv-b2: 100

And the resources would get assigned from top to bottom.

Actually I've been optimizing the placement-strategy lately. It will
sort the resource processing order according to the priorities and
scores of resources. That should result in ideal placement. Stay tuned.

Regards,
  Yan
-- 
Yan Gao 
Software Engineer
China Server Team, OPS Engineering, Novell, Inc.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Pacemaker / Postfix startup problem...

2011-04-19 Thread Adam Reiss
I'll get a chance to work on it today.  I'll let you know what happens.
:)

Thanks!!





-Original Message-
From: Raoul Bhatia [IPAX] [mailto:r.bha...@ipax.at] 
Sent: Tuesday, April 19, 2011 5:15 AM
To: The Pacemaker cluster resource manager
Cc: Adam Reiss
Subject: Re: [Pacemaker] Pacemaker / Postfix startup problem...

adam, any news on this?
if this is not working for you, i've got another idea.
but please report the current status first...

thanks,
raoul

On 04/14/2011 08:33 PM, Raoul Bhatia [IPAX] wrote:
> hi adam,
> 
> On 14.04.2011 18:10, Adam Reiss wrote:
>> Hi Raoul,
>>
>> We're trying to setup a HA SMTP Relay, so having pacemaker stop/start
>> the services as it passes the work over to the other machine, should
>> Postfix fail...  Is there a better way to allow an HA SMTP relay?
> 
> when we're setting up a clustered postfix, we do not mess with the
> default /etc/postfix/ config but use a different location on a drbd
> backed deviced instead.
> 
> e.g. /data/mail/
> 
> this way, local mail deliverey (cron output!) works without any issue
-
> even if the clustered postfix is down (e.g. for maintenance) or simply
> migrated to a different host.
> 
>> It's running under VMWare, having two different guests, on two
different
>> hosts...
>>
>> I've attached the output you've requested. :)
>>
>> There is no syslog file in /var/log .
> 
> mhm - your hb_report is incomplete too. i don't know centos - where
does
> centos' syslog write it's logfiles?
> 
> anyways, i've updated the postfix ocf ra to handle some configuration
> cases and errors:
> 
> 
>
https://github.com/raoulbhatia/resource-agents/tree/master/heartbeat/pos
tfix
> 
> 
> depending on your system, you might need to apply the following patch:
> 
> -: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
> -. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
> +: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
> +. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs
> 
> 
> could you please give it a shot and report whats happening?
> 
> if it is still *not* working for you, i would need your current
> configuration, a new hb_report and the system's logfiles.
> 
> thanks,
> raoul
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
>
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake
r


-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Announce: Hawk (HA Web Konsole) 0.4.0

2011-04-19 Thread Tim Serong
Greetings All,

This is to announce version 0.4.0 of Hawk, a web-based GUI for
managing and monitoring Pacemaker High-Availability clusters.

You can use Hawk 0.4.0 to:

  - Monitor your cluster, with much the same functionality as
crm_mon (displays node and resource status, failed ops).

  - Perform basic operator tasks:
- Node: standby, online, fence
- Resource: start, stop, migrate, unmigrate, clean up. 

  - Create, edit and delete primitives, groups, clones, m/s
resources.

  - Edit crm_config properties.

Hawk is intended to run on each node in your cluster, and is
accessible via HTTPS on port 7630.  You can then access it by
pointing your web browser at the IP address of any cluster node,
or the address of any IPaddr(2) resource you may have configured.

You will need to configure a user account to log in as.  The
same rules apply as for the python GUI; you need to log in as
a user in the "haclient" group. 

Packages for various SUSE-based distros can be obtained from the
network:ha-clustering and network:ha-clustering:Factory repos
on OBS, or you can just search for Hawk on software.opensuse.org:

  http://software.opensuse.org/search?baseproject=ALL&q=Hawk

I don't have Fedora/Red Hat packages yet, but building an RPM
from source is easy:

  # hg clone http://hg.clusterlabs.org/pacemaker/hawk
  # cd hawk
  # hg update tip
  # make rpm

My apologies to non-RPM-based distro users (packaging assistance
gladly accepted!)

Further information is available at:

  http://www.clusterlabs.org/wiki/Hawk

Please direct comments, feedback, questions, etc. to myself
and/or (preferably) the Pacemaker mailing list.

Happy clustering,

Tim


-- 
Tim Serong 
Senior Clustering Engineer, OPS Engineering, Novell Inc.




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Ordering set of resources, problem in ordering chain of resources

2011-04-19 Thread Rakesh K
Andrew Beekhof  writes:

> 
> There is nothing in this config that requires tomcat2 to be stopped.
> 
> Perhaps:
>colocation Tomcat2-with-Tomcat inf: Tomcat1 Tomcat2VIP
> was intended to be:
>colocation Tomcat2-with-Tomcat inf: Tomcat2 Tomcat1
> 
> The only other service active is httpd, which also has no constraints
> indicating it should stop when mysql is down.
> 

Thanks Andrew for the valuable feed back.

As mentioned i had changed the colocation constraint but still facing with the
same issue.

As per the order given in HA configuration, i am providing output of my crm
configure show command

node $id="6317f856-e57b-4a03-acf1-ca81af4f19ce" cisco-demomsf
node $id="87b8b88e-3ded-4e34-8708-46f7afe62935" mysql3
primitive Httpd ocf:heartbeat:apache \
params configfile="/etc/httpd/conf/httpd.conf" httpd="/usr/sbin/httpd" c
   
lient="curl" statusurl="http://localhost/img/test.html"; testregex="*" \
op start interval="0" timeout="60s" \
op monitor interval="50s" timeout="50s" \
meta target-role="Started"
primitive HttpdVIP ocf:heartbeat:IPaddr3 \
params ip="172.21.52.149" eth_num="eth0:4" vip_cleanup_file="/var/run/bi
   
gha.pid" \
op start interval="0" timeout="120s" \
op monitor interval="30s" \
meta target-role="Started"
primitive Mysql ocf:heartbeat:mysql \
params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" datadir="/var/
   
lib/mysql" user="mysql" pid="/var/lib/mysql/mysql.pid" socket="/var/lib/mysql/my
   
sql.sock" test_passwd="slavepass" test_table="msfha.conn" test_user="repl" repli
   
cation_user="repl" replication_passwd="slavepass" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="10s" role="Master" timeout="8s" \
op monitor interval="12s" timeout="8s"
primitive MysqlVIP ocf:heartbeat:IPaddr3 \
params ip="172.21.52.150" eth_num="eth0:3"
vip_cleanup_file="/var/run/bigha.pid" \
op start interval="0" timeout="120s" \
op monitor interval="30s" \
meta target-role="Started"
primitive Tomcat1 ocf:msf:tomcat \
params tomcat_name="tomcat"
statusurl="http://localhost:8080/dbtest/testtomcat.html"; java_home="/"
catalina_home="/home/msf/runtime/tomcat/apache-tomcat-6.0.18" client="curl"
testregex="*" \
op start interval="0" timeout="60s" \
op monitor interval="50s" timeout="50s" \
op stop interval="0" \
meta target-role="Started"
primitive Tomcat1VIP ocf:heartbeat:IPaddr3 \
params ip="172.21.52.140" eth_num="eth0:2"
vip_cleanup_file="/var/run/bigha.pid" \
op start interval="0" timeout="120s" \
op monitor interval="30s" \
meta target-role="Started"
primitive Tomcat2 ocf:msf:tomcat \
params tomcat_name="tomcat" statusurl="http://localhost:8081/";
java_home="/" catalina_home="/home/msf/runtime/tomcat2/apache-tomcat-6.0.18"
client="curl" testregex="*" \
op start interval="0" timeout="60s" \
op monitor interval="50s" timeout="50s" \
op stop interval="0" \
meta target-role="Started"
primitive Tomcat2VIP ocf:heartbeat:IPaddr3 \
params ip="172.21.52.139" eth_num="eth0:4"
vip_cleanup_file="/var/run/bigha.pid" \
op start interval="0" timeout="120s" \
op monitor interval="30s" \
meta target-role="Started"
ms MS_Mysql Mysql \
meta notify="true" target-role="Stopped"
location L_Master MS_Mysql \
rule $id="L_Master-rule" $role="Master" 100: #uname eq cisco-demomsf \
rule $id="L_Master-rule1" $role="Master" 100: #uname eq mysql3
colocation Httpd-with-ip inf: HttpdVIP Httpd
colocation Mysql-with-ip inf: MysqlVIP MS_Mysql:Master
colocation Tomcat1-with-ip inf: Tomcat1VIP Tomcat1
colocation Tomcat2-with-Tomcat inf: Tomcat2 Tomcat1
colocation tomcat2-with-ip inf: Tomcat2VIP Tomcat2
order Httpd-after-Tomcat2 inf: Tomcat2 Httpd
order Httpd-after-op inf: HttpdVIP Httpd
order Mysql-after-ip inf: MysqlVIP MS_Mysql
order Tomcat1-after-MYSQL inf: MS_Mysql Tomcat1VIP
order Tomcat1-after-ip inf: Tomcat1VIP Tomcat1
order Tomcat2-after-ip inf: Tomcat2VIP Tomcat2
property $id="cib-bootstrap-options" \
dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1300787402"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
  


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/ma

Re: [Pacemaker] Unable to stop Multi state resource

2011-04-19 Thread Rakesh K
Rakesh K  writes:


Hi Andrew 

FSR is a File system replication script which adheres to ocf cluster frame work,
the script is similar to Mysql ocf script, which is a multi state resource,
where in master  ssh server would be running and in slave there are rsync
scripts which uses to synchronize the data between the Master and slave.

the rsync script will be having the Master FSR location, so that the rysnc tool
will be frequently replication the data from the FSR master location.

here is the crm configuration show output 

node $id="82a5281a-a069-49c1-9f57-d4a8f6eb3d72" prodmsf2
node $id="d8b6c2e7-d1c3-4a15-9411-ed4d710c8672" prodmsf
primitive FSR ocf:msf:fsr \
params client_script="/home/msf/ha/scripts/ocf/rsyncClient"
source_dir="/home/msf/services/persistence/"
dest_dir="/home/msf/services/persistence/" user="root" pid="/var/run/fsr.pid"
rsync_binary="/usr/bin/rsync" rsync_options="-az" rsync_interval="1"
config_file="/home/msf/ha/config/ocf/fsr.config"
status_dump="/home/msf/ha/status/rsync_client_dump" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="10s" role="Master" timeout="8s" \
op monitor interval="12s" timeout="8s"
primitive Httpd ocf:heartbeat:apache \
params configfile="/etc/httpd/conf/httpd.conf" httpd="/usr/sbin/httpd"
client="curl" statusurl="http://localhost/img/test.html"; testregex="*" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="50s" timeout="50s"
primitive HttpdVIP ocf:heartbeat:IPaddr3 \
params ip="10.10.30.103" eth_num="eth0:1"
vip_cleanup_file="/var/run/bigha.pid" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="30s" \
meta target-role="Started"
primitive Mysql ocf:heartbeat:mysql \
params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf"
datadir="/var/lib/mysql" user="mysql" pid="/var/lib/mysql/mysql.pid"
socket="/var/lib/mysql/mysql.sock" test_passwd="slavepass"
test_table="test.conn" test_user="repl" replication_user="repl"
replication_passwd="slavepass" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="10s" role="Master" timeout="8s" \
op monitor interval="12s" timeout="8s"
primitive MysqlVIP ocf:heartbeat:IPaddr3 \
params ip="10.10.30.105" eth_num="eth0:3"
vip_cleanup_file="/var/run/bigha.pid" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="60s" \
op monitor interval="30s" \
meta target-role="Started"
primitive Tomcat1 ocf:msf:tomcat1 \
params tomcat_name="tomcat" statusurl="http://localhost:8080/";
java_home="/" catalina_home="/home/msf/runtime/tomcat/apache-tomcat-6.0.18"
client="curl" testregex="*" \
op start interval="0" timeout="120s" \
op monitor interval="50s" timeout="50s" \
op stop interval="0" timeout="120s" \
meta target-role="Started"
primitive Tomcat1VIP ocf:heartbeat:IPaddr3 \
params ip="10.10.30.104" eth_num="eth0:2"
vip_cleanup_file="/var/run/bigha.pid" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="30s" \
meta target-role="Started"
ms MS_FSR FSR \
meta notify="true" target-role="Started"
ms MS_Mysql Mysql \
meta notify="true" target-role="Started"
colocation FSR-with-Tomcat inf: Tomcat1 MS_FSR:Master
colocation Httpd-with-ip inf: HttpdVIP Httpd
colocation Mysql-with-ip inf: MysqlVIP MS_Mysql:Master
colocation Tomcat1-with-ip inf: Tomcat1VIP Tomcat1
order FSR-after-tomcat inf: Tomcat1 MS_FSR
order Httpd-after-ip inf: HttpdVIP Httpd
order Httpd-after-tomcat inf: Tomcat1 HttpdVIP
order Mysql-after-ip inf: MysqlVIP MS_Mysql
order Tomcat1-after-MYSQL inf: MS_Mysql Tomcat1VIP
order Tomcat1-after-ip inf: Tomcat1VIP Tomcat1
property $id="cib-bootstrap-options" \
dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
no-quorum-policy="ignore"

Regards
Rakesh 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Ordering set of resources, problem in ordering chain of resources

2011-04-19 Thread Andrew Beekhof
There is nothing in this config that requires tomcat2 to be stopped.

Perhaps:
   colocation Tomcat2-with-Tomcat inf: Tomcat1 Tomcat2VIP
was intended to be:
   colocation Tomcat2-with-Tomcat inf: Tomcat2 Tomcat1

The only other service active is httpd, which also has no constraints
indicating it should stop when mysql is down.



On Tue, Apr 19, 2011 at 12:04 PM, rakesh k  wrote:
> Hi Andrew
>
> As mentioned please find the output of cibadmin -Ql in this post.
>
>  dc-uuid="87b8b88e-3ded-4e34-8708-46f7afe62935" admin_epoch="0" epoch="1105"
> num_updates="18">
>   
>     
>   
>      value="1.0.9-89bd754939df5150de7cd76835f98fe90851b677"/>
>      name="cluster-infrastructure" value="Heartbeat"/>
>      name="stonith-enabled" value="false"/>
>      name="no-quorum-policy" value="ignore"/>
>      name="last-lrm-refresh" value="1300787402"/>
>   
>     
>     
>    uname="mysql3"/>
>    uname="cisco-demomsf"/>
>     
>     
>    type="IPaddr3">
>     
>    value="172.21.52.149"/>
>    value="eth0:4"/>
>    name="vip_cleanup_file" value="/var/run/bigha.pid"/>
>     
>     
>    timeout="120s"/>
>   
>     
>     
>    name="target-role" value="Started"/>
>     
>   
>   
>     
>    name="configfile" value="/etc/httpd/conf/httpd.conf"/>
>    value="/usr/sbin/httpd"/>
>    value="curl"/>
>    value="http://localhost/img/test.html"/>
>    value="*"/>
>     
>     
>   
>    timeout="50s"/>
>     
>     
>    value="Started"/>
>     
>   
>   
>     
>    name="tomcat_name" value="tomcat"/>
>    name="statusurl" value="http://localhost:8080/dbtest/testtomcat.html"/>
>    name="java_home" value="/"/>
>    name="catalina_home" value="/home/msf/runtime/tomcat/apache-tomcat-6.0.18"/>
>    value="curl"/>
>    name="testregex" value="*"/>
>     
>     
>   
>    timeout="50s"/>
>   
>     
>     
>    id="Tomcat1-meta_attributes-target-role" value="Started"/>
>     
>   
>    type="IPaddr3">
>     
>    value="172.21.52.140"/>
>    value="eth0:2"/>
>    name="vip_cleanup_file" value="/var/run/bigha.pid"/>
>     
>     
>    timeout="120s"/>
>   
>     
>     
>    name="target-role" value="Started"/>
>     
>   
>    id="MysqlVIP">
>     
>    value="172.21.52.150"/>
>    value="eth0:3"/>
>    name="vip_cleanup_file" value="/var/run/bigha.pid"/>
>     
>     
>    timeout="120s"/>
>   
>     
>     
>    id="MysqlVIP-meta_attributes-target-role" value="Started"/>
>     
>   
>   
>     
>    value="true"/>
>    id="MS_Mysql-meta_attributes-target-role" value="Stopped"/>
>     
>     
>   
>      value="/usr/bin/mysqld_safe"/>
>      value="/etc/my.cnf"/>
>      value="/var/lib/mysql"/>
>      value="mysql"/>
>      value="/var/lib/mysql/mysql.pid"/>
>      value="/var/lib/mysql/mysql.sock"/>
>      name="test_passwd" value="slavepass"/>
>      name="test_table" value="msfha.conn"/>
>      name="test_user" value="repl"/>
>      name="replication_user" value="repl"/>
>      name="replication_passwd" value="slavepass"/>
>   
>   
>      timeout="120s"/>
>     
>      role="Master" timeout="8s"/>
>      timeout="8s"/>
>   
>     
>   
>   
>     
>    name="tomcat_name" value="tomcat"/>
>    name="statusurl" value="http://localhost:8081/"/>
>    name="java_home" value="/"/>
>    name="catalina_home"
> value="/home/msf/runtime/tomcat2/apache-tomcat-6.0.18"/>
>    value="curl"/>
>    name="testregex" value="*"/>
>     
>     
>   
>    timeout="50s"/>
>   
>     
>     
>    name="target-role" value="Started"/>
>     
>   
>    type="IPaddr3">
>     
>    value="172.21.52.139"/>
>    value="eth0:4"/>
>    name="vip_cleanup_file" value="/var/run/bigha.pid"/>
>     
>     
>    timeout="120s"/>
>   
>     
>     
>    name="target-role" value="Started"/>
>     
>   
>     
>     
>    with-rsc="Httpd"/>
>    with-rsc="Tomcat1"/>
>    with-rsc="MS_Mysql" with-rsc-role="Master"/>
>   
>     
>    operation="eq" value="cisco-demomsf"/>
>     
>     
>    operation="eq" value="mysql3"/>
>     
>   
>    then="Httpd"/>
>    then="MS_Mysql"/>
>    then="Tomcat1"/>
>    then="Tomcat1VIP"/>
> 

[Pacemaker] SBD kills both nodes in a two node cluster.

2011-04-19 Thread Ulf
I' ve two nodes with shared storage and multipathing. But the SBD device 
doesn't work as expected.
My idea was that in case of a split brain: One node kills the other node and 
one will survive.
But in my case I get a double kill, both nodes will be killed at the same time.
I simulated the split brain with "ip link set down eth0" on one node. I tested 
it several times.

The sbd deamon is running on both nodes.
My configuration:
primitive stonith_sbd stonith:external/sbd params 
sbd_device="/dev/disk/by-id/scsi-36..."
clone stonith_sbd-clone stonith_sbd

/var/log/messages:
Node A:
Apr 19 10:37:09 nodeA crmd: [7690]: info: te_fence_node: Executing reboot 
fencing operation (17) on nodeB (timeout=18)
Apr 19 10:37:09 nodeA stonith-ng: [7685]: info: initiate_remote_stonith_op: 
Initiating remote operation reboot for nodeB: 
d4226746-fef1-4d29-bc85-2d33e9bf7f94
Apr 19 10:37:09 nodeA stonith-ng: [7685]: info: stonith_queryQuery 



Node B:
Apr 19 10:37:09 nodeB crmd: [7851]: info: te_fence_node: Executing reboot 
fencing operation (17) on nodeA (timeout=18)
Apr 19 10:37:09 nodeB stonith-ng: [7846]: info: initiate_remote_stonith_op: 
Initiating remote operation reboot for nodeA: 
e361b3b6-2890-474d-8671-b73eea62d1ab
Apr 19 10:37:09 nodeB stonith-ng: [7846]: info: stonith_queryQuery 



On both nodes I started a "sbd -d /dev/disk/by-id/scsi-36... list" in an 
endless loop and these are the last SBD commands I get.
As you can see both nodes request a reset at the same time and both will 
succeed => double kill.
Node A:
0   nodeB clear
1   nodeA clear
0   nodeB clear
1   nodeA reset   nodeB
0   nodeB reset   nodeA
1   nodeA reset   nodeB

Node B:
0   nodeB clear
1   nodeA reset   nodeB
0   nodeB clear
1   nodeA reset   nodeB
0   nodeB clear
1   nodeA reset   nodeB
0   nodeB reset   nodeA
1   nodeA reset   nodeB
0   nodeB reset   nodeA
1   nodeA reset   nodeB


Cheers,
Ulf
-- 
NEU: FreePhone - kostenlos mobil telefonieren und surfen!   
Jetzt informieren: http://www.gmx.net/de/go/freephone

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] how to get pacemaker:ping recheck before promoting drbd resources on a node

2011-04-19 Thread Jelle de Jong
On 19-04-11 11:31, Andrew Beekhof wrote:
> It the underlying messaging/membership layer goes into spasms - 
> there's not much ping can do to help you. What version of corosync
> have you got?  Some versions have been better than others.

corosync 1.2.1-4
pacemaker 1.0.9.1+hg15626-1
/etc/debian_version 6.0.1 (stable)

> Correct, its checked periodically.

Can I change the config that a ping check is done before promoting drbd?

I tried adding a seperate ping0: http://pastebin.com/raw.php?i=2WD1HKnC
I thought it worked but ping0 starts and drbd is still promoted probably
because ping0 returns a successful start but does not return an error
because the actual ping failed. So I tried adding additonal location
rules for ping0 but then the resources is not started at anymore:
http://pastebin.com/raw.php?i=DXqRzMNs

> That is something that would be needed to be added to the drbd
> agent. Alternatively, configure the ping resource to update more
> frequently.

How can this be done? crm ra info ocf:ping doesn't show much info. I
tried using attempts="1" dampen="1" timeout="1" and monitor
interval="1". An example how to do frequent fast ping would be welcome.

If I cam make the ping check fast enough to detect network failures
before corosync tell pacemaker the other node disappears/failed this may
provide a workaround solution.

> But you did loose the node. The cluster can't see into the future to
> know that it will come back in a bit. What token timeouts are you
> using?

True, but the node should see his own network is down and see he is the
one that was failing and wait until his network is back and check his
situation again before doing things with his resources.

My corosync.conf with token 3000: http://pastebin.com/Y5Lkf4Ch

Thanks in advance,

Any help is much appreciated,

Kind regards,

Jelle de Jong

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] how to get pacemaker:ping recheck before promoting drbd resources on a node

2011-04-19 Thread Andrew Beekhof
On Mon, Apr 18, 2011 at 8:57 PM, Jelle de Jong
 wrote:
> Hello everybody,
>
> I need to be able to bring down my network interface (network failure
> test) and few seconds later bring it up again. Without my drbd cluster
> going nuts and creating split brains.
>
> I was advised to use ocf:pacemaker:ping, so I started to integrate this
> in my configuration: http://pastebin.com/raw.php?i=iyp3URkP

It the underlying messaging/membership layer goes into spasms -
there's not much ping can do to help you.
What version of corosync have you got?  Some versions have been better
than others.

> Now the problem is that it kind of works, but not the way I need it to be.
>
> The ping status is not rechecked right _before_ it tries to promoted the
> drbd resources.

Correct, its checked periodically.

> If should do a fast ping check and continue if
> successful but _don’t_ promote any drbd resources when it stalls or fails.

That is something that would be needed to be added to the drbd agent.
Alternatively, configure the ping resource to update more frequently.

>
> The problem is that the ping have been returning good values back until
> the network failure and when the failure accrues it is still thinking
> the ping status is good and promotes the disk until and few seconds
> later the ping status changes to indicate the network failure, but then
> all damage is already made...
>
> I must be doing something _terrible_ wrong since I can't believe a
> pacemaker/corosync cluster shouldn't be able to survive a network glitch
> (short network failures) without all kind of split brains and losing the
> node.

But you did loose the node.
The cluster can't see into the future to know that it will come back in a bit.

What token timeouts are you using?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] A question and demand to a resource placement strategy function

2011-04-19 Thread Andrew Beekhof
Yan is our utilization expert, lets see if he can provide some
direction here :-)

-- Andrew

2011/4/18 Yuusuke IIDA :
> Hi, Andrew
>
> I want to disperse using a resource placement strategy function of 
> Pacemaker-1.1
> in the fail-over point of the resource in N to N environment.
>
> After testing a function by the following setting, there was the pattern that
> the dispersion of the resource was not carried out.
> * 1ACTIVE: 2PASSIVE
> * placement-strategy=balanced
> * capacity of 2PASSIVE is a tie score
>
> * Initial state
> Online: [srv-b1 srv-b2 srv-a1]
> Full list of resources:
> main_rsc1 (ocf::pacemaker:Dummy): Started srv-a1
> main_rsc2 (ocf::pacemaker:Dummy): Started srv-a1
> main_rsc3 (ocf::pacemaker:Dummy): Started srv-a1
>
> # crm configure ptest utilization
> Utilization information:
> Original: srv-b2 capacity: capacity=3
> Original: srv-b1 capacity: capacity=3
> Original: srv-a1 capacity: capacity=3
> calculate_utilization: main_rsc1 utilization on srv-a1: capacity=1
> calculate_utilization: main_rsc2 utilization on srv-a1: capacity=1
> calculate_utilization: main_rsc3 utilization on srv-a1: capacity=1
> Remaining: srv-b2 capacity: capacity=3
> Remaining: srv-b1 capacity: capacity=3
> Remaining: srv-a1 capacity: capacity=0
>
> * When it is dispersed definitely
> When I produced trouble in a resource in order of next, I disperse and am 
> placed
> in the node that the remainder of capacity has a big.
>
> main_rsc1 -> main_rsc2 -> main_rsc3
>
> Online: [srv-b1 srv-b2 srv-a1]
> Full list of resources:
> main_rsc1 (ocf::pacemaker:Dummy): Started srv-b1
> main_rsc2 (ocf::pacemaker:Dummy): Started srv-b2
> main_rsc3 (ocf::pacemaker:Dummy): Started srv-b1
>
> # crm configure ptest utilization
> Utilization information:
> Original: srv-b2 capacity: capacity=3
> Original: srv-b1 capacity: capacity=3
> Original: srv-a1 capacity: capacity=3
> calculate_utilization: main_rsc1 utilization on srv-b1: capacity=1
> calculate_utilization: main_rsc2 utilization on srv-b2: capacity=1
> calculate_utilization: main_rsc3 utilization on srv-b1: capacity=1
> Remaining: srv-b2 capacity: capacity=2
> Remaining: srv-b1 capacity: capacity=1
> Remaining: srv-a1 capacity: capacity=3
>
> * When it is not dispersed well
> When I produced trouble in a resource in order of next, I am partial, and the
> resource is placed in one node.
>
> main_rsc3 -> main_rsc2 -> main_rsc1
>
> Online: [srv-b1 srv-b2 srv-a1]
> Full list of resources:
> main_rsc1 (ocf::pacemaker:Dummy): Started srv-b1
> main_rsc2 (ocf::pacemaker:Dummy): Started srv-b1
> main_rsc3 (ocf::pacemaker:Dummy): Started srv-b1
>
> # crm configure ptest utilization
> Utilization information:
> Original: srv-b2 capacity: capacity=3
> Original: srv-b1 capacity: capacity=3
> Original: srv-a1 capacity: capacity=3
> calculate_utilization: main_rsc1 utilization on srv-b1: capacity=1
> calculate_utilization: main_rsc2 utilization on srv-b1: capacity=1
> calculate_utilization: main_rsc3 utilization on srv-b1: capacity=1
> Remaining: srv-b2 capacity: capacity=3
> Remaining: srv-b1 capacity: capacity=0
> Remaining: srv-a1 capacity: capacity=3
>
> I think that this problem occurs by difference in order of handling of 
> resource.
> I attach hb_report when a problem occurred.
> Is this movement a bug?
> Or does my setting make a mistake of any?
>
> Best Regards,
> Yuusuke IIDA
> --
> 
> METRO SYSTEMS CO., LTD
>
> Yuusuke Iida
> Mail: iiday...@intellilink.co.jp
> 
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Pacemaker / Postfix startup problem...

2011-04-19 Thread Raoul Bhatia [IPAX]
adam, any news on this?
if this is not working for you, i've got another idea.
but please report the current status first...

thanks,
raoul

On 04/14/2011 08:33 PM, Raoul Bhatia [IPAX] wrote:
> hi adam,
> 
> On 14.04.2011 18:10, Adam Reiss wrote:
>> Hi Raoul,
>>
>> We're trying to setup a HA SMTP Relay, so having pacemaker stop/start
>> the services as it passes the work over to the other machine, should
>> Postfix fail...  Is there a better way to allow an HA SMTP relay?
> 
> when we're setting up a clustered postfix, we do not mess with the
> default /etc/postfix/ config but use a different location on a drbd
> backed deviced instead.
> 
> e.g. /data/mail/
> 
> this way, local mail deliverey (cron output!) works without any issue -
> even if the clustered postfix is down (e.g. for maintenance) or simply
> migrated to a different host.
> 
>> It's running under VMWare, having two different guests, on two different
>> hosts...
>>
>> I've attached the output you've requested. :)
>>
>> There is no syslog file in /var/log .
> 
> mhm - your hb_report is incomplete too. i don't know centos - where does
> centos' syslog write it's logfiles?
> 
> anyways, i've updated the postfix ocf ra to handle some configuration
> cases and errors:
> 
> 
> https://github.com/raoulbhatia/resource-agents/tree/master/heartbeat/postfix
> 
> 
> depending on your system, you might need to apply the following patch:
> 
> -: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
> -. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
> +: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
> +. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs
> 
> 
> could you please give it a shot and report whats happening?
> 
> if it is still *not* working for you, i would need your current
> configuration, a new hb_report and the system's logfiles.
> 
> thanks,
> raoul
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Question of the syslog output in pacemaker-1.1

2011-04-19 Thread Andrew Beekhof
On Tue, Apr 19, 2011 at 9:25 AM, Yuusuke IIDA
 wrote:
> Hi, Andrew
>
> I use corosync-1.3.0 and Pacemaker-1.1.5.
>
> The log outputs it via rsyslog.
>
> I changed syslog_facility of corosync.conf to local1 and was going to let a
> designated file output the log of the cluster.
>
> However, setting was not reflected for a process performed of fork by 
> pacemakerd.

Ah, I see the problem.
The following patch will be in devel shortly

diff -r 7225f68ae6e9 mcp/pacemaker.c
--- a/mcp/pacemaker.c   Mon Apr 18 16:52:22 2011 +0200
+++ b/mcp/pacemaker.c   Tue Apr 19 11:12:09 2011 +0200
@@ -692,7 +692,7 @@ main(int argc, char **argv)
crm_make_daemon(crm_system_name, TRUE, pid_file);

/* Only Re-init if we're running daemonized */
-   crm_log_init_quiet(NULL, LOG_INFO, TRUE, FALSE, argc, argv);
+   crm_log_init(NULL, LOG_INFO, TRUE, FALSE, 0, NULL);
 }

 crm_info("Starting Pacemaker %s (Build: %s): %s\n", VERSION,
BUILD_VERSION, CRM_FEATURES);


>
> facility of a process performed of fork by pacemakerd remained daemon.
>
> It was only corosync and pacemakerd that the setting that I changed became
> effective.
>
> Why is setting of syslog_facility of corosync.conf ineffective in a process
> performed of fork by pacemakerd?
>
> Please teach a method to change facility of a process performed of fork by
> pacemakerd.
>
> Best Regards,
> Yuusuke IIDA
> --
> 
> METRO SYSTEMS CO., LTD
>
> Yuusuke Iida
> Mail: iiday...@intellilink.co.jp
> 
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Heartbeat over Disk or non IP possible?

2011-04-19 Thread Andrew Beekhof
On Tue, Apr 19, 2011 at 9:31 AM, Ulf  wrote:
> Hi,
>
> So it seems not to be possible to do a heartbeat over disk.
> Is it planned to introduce such a feature?
>

It would be a feature of the underlying communications layer.
So you'd have to ask the heartbeat or corosync maintainers - but in
both cases I suspect the answer is no.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Ordering set of resources, problem in ordering chain of resources

2011-04-19 Thread Andrew Beekhof
On Tue, Apr 19, 2011 at 10:35 AM, Rakesh K  wrote:
> Andrew Beekhof  writes:
>
>
> Hi Andrew thanks for giving reply.
>
> The version of pacemaker i am using is pacemaker-1.0.9.1

Ok. Could be a bug.
Can you attach the output of cibadmin -Ql when the cluster is in the
state you describe?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] mysql m/s failover: 'Could not find first log file name in binary log index file'

2011-04-19 Thread Raoul Bhatia [IPAX]
On 04/19/2011 10:38 AM, Marek Marczykowski wrote:
> On 04/19/11 10:29, Raoul Bhatia [IPAX] wrote:
>> On 04/19/2011 10:20 AM, Marek Marczykowski wrote:
>>> On 04/19/11 10:01, Raoul Bhatia [IPAX] wrote:
 what i can currently think of:

 1. run a cronjob which periodically analyzes the binlogs and will update
 the node's log-file and log-pos attributes if there are empty binlogs;
 (that's the best method, i think)

 2. restore the purged binlogs from a backup, hack the mysql-bin.index
 file to re-include them, and hope that they will not be purged upon
 replication restart?


 any input on this?
>>>
>>> I've similar problem... I think the first solution is better - after 7
>>> days log-file and log-pos can be cleared to use "FIRST" as start point.
>>
>> in your opintion, is it possible to fix this via the ocf ra or does it
>> have to be a separate cronjob?
> 
> I haven't idea how to do it in ra. There is no easy way to look what
> binlogs are on the other node. Maybe some tricks storing that info on
> monitor action, but this is ugly and makes ra depending on monitor
> action enabled...
> The easiest solutions are the best :)

hi marek,

so i'm trying to modify the cib to do change master to replicate
from mysql-bin.31:0

1. i set standby for wdb01
2. i modify the cib via crm:
> node wdb01 \
> attributes service="wdb" \
> attributes standby="on" wdb02-log-file-wdb-mysql="mysql-bin.31" 
> wdb02-log-pos-wdb-mysql="0"

3. i set online for wdb01

however, this does not work.

upon set_master, the ra checks the current slave information
and finds the (incorrect) mysql-bin.15:24386.

thus, set_master "keeps" this configuration and returns
(mysql ra line 529ff):

> Apr 19 10:46:28 wdb01 mysql-repl[25279]: INFO: Changing MySQL configuration 
> to replicate from wdb02.
> Apr 19 10:46:28 wdb01 mysql-repl[25279]: INFO: Kept master pos for wdb02 : 
> mysql-bin.15:24386
> Apr 19 10:46:28 wdb01 mysql-repl[25279]: INFO: Changing MySQL configuration 
> to replicate from wdb02.
> Apr 19 10:46:28 wdb01 mysql-repl[25279]: INFO: Kept master pos for wdb02 : 
> mysql-bin.15:24386


how would you try to restart the replication?

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Ordering set of resources, problem in ordering chain of resources

2011-04-19 Thread Rakesh K
Andrew Beekhof  writes:


Hi Andrew thanks for giving reply.

The version of pacemaker i am using is pacemaker-1.0.9.1


Regards
Rakesh 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] mysql m/s failover: 'Could not find first log file name in binary log index file'

2011-04-19 Thread Raoul Bhatia [IPAX]
On 04/19/2011 10:20 AM, Marek Marczykowski wrote:
> On 04/19/11 10:01, Raoul Bhatia [IPAX] wrote:
>> what i can currently think of:
>>
>> 1. run a cronjob which periodically analyzes the binlogs and will update
>> the node's log-file and log-pos attributes if there are empty binlogs;
>> (that's the best method, i think)
>>
>> 2. restore the purged binlogs from a backup, hack the mysql-bin.index
>> file to re-include them, and hope that they will not be purged upon
>> replication restart?
>>
>>
>> any input on this?
> 
> I've similar problem... I think the first solution is better - after 7
> days log-file and log-pos can be cleared to use "FIRST" as start point.

in your opintion, is it possible to fix this via the ocf ra or does it
have to be a separate cronjob?

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] mysql m/s failover: 'Could not find first log file name in binary log index file'

2011-04-19 Thread Raoul Bhatia [IPAX]
On 04/19/2011 10:01 AM, Raoul Bhatia [IPAX] wrote:
> the failover worked and wdb02 is up and running.
> upon rejoin, wdb01 wanted to start syncing from mysql-bin.15,
> position 24386 (as saved in the cib).
> 
> this fails with error "Last_IO_Errno: 1236" and the message:
>> Last_IO_Error: Got fatal error 1236 from master when reading data
>> from binary log: 'Could not find first log file name in binary log
>> index file'

one additional note:

the ra does not detect this as a failure either. pacemaker reports
that both instances are up and running.

there is a "WARNING: MySQL Slave IO threads currently not running."
in the logs, but the ra checks for "Last_Errno:" only.

thus, it does not catch *any* other error, e.g.
> # mysql -h wdb01c -e "show slave status\G"|grep -i err
>Last_Errno: 0
>Last_Error: 
> Last_IO_Errno: 1236
> Last_IO_Error: Got fatal error 1236 from master when reading 
> data from binary log: 'Could not find first log file name in binary log index 
> file'
>Last_SQL_Errno: 0
>Last_SQL_Error: 

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] mysql m/s failover: 'Could not find first log file name in binary log index file'

2011-04-19 Thread Raoul Bhatia [IPAX]
hi,

i'm starting a new thread to address a specific
"Could not find first log file name in binary log index file" error
upon failover.


background:
i currently have a two node mysql m/s setup.
expire_logs_days (was) set to 7 days
last failover happend > 7 days ago (therefore, binlogs have been purged)


i today tested another failover.
config before failover:
> node wdb01 \
> attributes service="wdb" \
> attributes standby="off" wdb02-log-file-wdb-mysql="mysql-bin.15" 
> wdb02-log-pos-wdb-mysql="24386"
> node wdb02 \
> attributes service="wdb" \
> attributes standby="off"


as stated, some binlogs, including mysql-bin.15, have already been
purged from wdb02. current binlogs:

> -rw-rw 1 mysql adm  149 Apr 11 06:25 mysql-bin.31
> -rw-rw 1 mysql adm  125 Apr 11 10:44 mysql-bin.32
> -rw-rw 1 mysql adm  125 Apr 11 22:50 mysql-bin.33
> -rw-rw 1 mysql adm  149 Apr 12 06:25 mysql-bin.34
> -rw-rw 1 mysql adm  149 Apr 13 06:25 mysql-bin.35
> -rw-rw 1 mysql adm  149 Apr 14 06:25 mysql-bin.36
> -rw-rw 1 mysql adm  125 Apr 14 17:01 mysql-bin.37
> -rw-rw 1 mysql adm  149 Apr 15 06:25 mysql-bin.38
> -rw-rw 1 mysql adm  149 Apr 16 06:25 mysql-bin.39
> -rw-rw 1 mysql adm  149 Apr 17 06:25 mysql-bin.40
> -rw-rw 1 mysql adm  149 Apr 18 06:25 mysql-bin.41
> -rw-rw 1 mysql adm  125 Apr 18 15:45 mysql-bin.42
> -rw-rw 1 mysql adm  149 Apr 19 06:25 mysql-bin.43
> -rw-rw 1 mysql adm  125 Apr 19 09:03 mysql-bin.44
> -rw-rw 1 mysql adm  5366995 Apr 19 09:46 mysql-bin.45
> -rw-rw 1 mysql adm  540 Apr 19 09:04 mysql-bin.index


the failover worked and wdb02 is up and running.
upon rejoin, wdb01 wanted to start syncing from mysql-bin.15,
position 24386 (as saved in the cib).

this fails with error "Last_IO_Errno: 1236" and the message:
> Last_IO_Error: Got fatal error 1236 from master when reading data
> from binary log: 'Could not find first log file name in binary log
> index file'

given the circumstance, that the binlogs have been purged, this is
somewhat expected.

i wonder though, if there is a possibility to automagically trouble-
shoot this issue, as - as you can see from above - all binlogs up to
mysql-bin.45 are empty.


what i can currently think of:

1. run a cronjob which periodically analyzes the binlogs and will update
the node's log-file and log-pos attributes if there are empty binlogs;
(that's the best method, i think)

2. restore the purged binlogs from a backup, hack the mysql-bin.index
file to re-include them, and hope that they will not be purged upon
replication restart?


any input on this?

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Heartbeat over Disk or non IP possible?

2011-04-19 Thread Ulf
Hi,

So it seems not to be possible to do a heartbeat over disk.
Is it planned to introduce such a feature?

Cheers,
Ulf

>On Sat, Apr 16, 2011 at 12:23 PM, Ulf  wrote:
>
>Hi,
>
>is there a way to implement a heartbeat over disk? Or any other non IP 
> medium?
>
>I think the SFEX agent does something like this, but I guess it will >not 
> work well if you plan to use MD (soft raid) or OCFS2/GFS2.
>
>Cheers,
>Ulf
-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Question of the syslog output in pacemaker-1.1

2011-04-19 Thread Yuusuke IIDA
Hi, Andrew

I use corosync-1.3.0 and Pacemaker-1.1.5.

The log outputs it via rsyslog.

I changed syslog_facility of corosync.conf to local1 and was going to let a
designated file output the log of the cluster.

However, setting was not reflected for a process performed of fork by 
pacemakerd.

facility of a process performed of fork by pacemakerd remained daemon.

It was only corosync and pacemakerd that the setting that I changed became
effective.

Why is setting of syslog_facility of corosync.conf ineffective in a process
performed of fork by pacemakerd?

Please teach a method to change facility of a process performed of fork by
pacemakerd.

Best Regards,
Yuusuke IIDA
-- 

METRO SYSTEMS CO., LTD

Yuusuke Iida
Mail: iiday...@intellilink.co.jp


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Ordering set of resources, problem in ordering chain of resources

2011-04-19 Thread Andrew Beekhof
What version of pacemaker?

On Tue, Apr 19, 2011 at 9:10 AM, rakesh k  wrote:
> Hi All
>
> I had configured Heartbeeat, pacemkaer on my two VM's
>
> Cluster with two nodes, two nodes are running cent-os as operation systems.
>
> Cluster configured with 8 resources and defined order, taking reference
> pacemaker explained
>
> please find the order and co-location constraints, taken from cib.xml
>
>  with-rsc="Httpd"/>
>  with-rsc="Tomcat1"/>
>  with-rsc="MS_Mysql" with-rsc-role="Master"/>
>  then="Httpd"/>
>  then="MS_Mysql"/>
>  then="Tomcat1"/>
>  then="Tomcat1VIP"/>
>  with-rsc="Tomcat2VIP"/>
>  then="Tomcat2"/>
>  then="Httpd"/>
>  with-rsc="Tomcat2"/>
>
> so when heartbeat start on both nodes, the order resources started are
> MysqlVIP-->MSMysql-->tomcat1VIP-->Tomcat-->Tomcat2VIP-->tomcat2-->HttpdVIP-->Httpd
>
> My question is when i try to stop Mysql process since there is an order
> constraint .all the resources should stop, but still, when i do crm_mon  i
> see tomcat2,tomcat2vip,http,httpdVIP resources running on cluster frame work
> , can you please suggest me if there is any flaw in determining the order or
> co-location constraints.
>
>
>
> Regards
> Rakesh
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Ordering set of resources, problem in ordering chain of resources

2011-04-19 Thread rakesh k
Hi All

I had configured Heartbeeat, pacemkaer on my two VM's

Cluster with two nodes, two nodes are running *cent-os* as operation
systems.

Cluster configured with 8 resources and defined order, taking reference
pacemaker explained

please find the order and co-location constraints, taken from cib.xml













so when heartbeat start on both nodes, the order resources started are
MysqlVIP-->MSMysql-->tomcat1VIP-->Tomcat-->Tomcat2VIP-->tomcat2-->HttpdVIP-->Httpd

My question is when i try to stop Mysql process since there is an order
constraint .all the resources should stop, but still, when i do *crm_mon  *i
see tomcat2,tomcat2vip,http,httpdVIP resources running on cluster frame work
, can you please suggest me if there is any flaw in determining the order or
co-location constraints.



Regards
Rakesh
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker