from:"Raoul Bhatia \[IPAX\]"

Re: [Pacemaker] Does "stonith_admin --confirm" work?

2013-05-22 Thread Raoul Bhatia [IPAX]


Hello Andrew!

On 2013-05-20 06:43, Andrew Beekhof wrote:
[...]

Well, thats not nothing, but it certainly doesn't look right either.
I will investigate.  Which version is this?


I'm running Debian GNU/Linux 6.0 Squeeze 64bit latest patch level with
the current backports packages:

pacemaker 1.1.7-1~bpo60+1 aka 
1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff

corosync 1.4.2-1~bpo60+1

So in theory, Wheezy is affected too.

Cheers,
Raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Does "stonith_admin --confirm" work?

2013-05-19 Thread Raoul Bhatia [IPAX]


On 17.05.2013 10:22, Староверов Никита Александрович wrote:

Nothing happened after stonith_admin -C.
Fenced still trying to fence_pcmk, and I see lots of "Timer expired" from 
stonith-ng, and failed fence_ipmilan operations.

Yes,  I can do fence_ack_manual on cman-master node, and then cleanup node 
state with cibadmin, but it is very slw way.
If I lost many servers in cluster, for example, lost power in one rack with two 
or more servers, I need a way to running again services on remaining nodes as 
quickly as possible.

I think fencing manual acknowledgement must be fast and simple and I suppose 
that stonith_admin --confirm have to do that.


I would also like to know a solution to this problem.
My current situation: I am using IPMI as a stonith device.

However, if there is a problem with the (redundant) power supply
and the IPMI device is therefore not working, I'm having a hard time
in troubleshooting my 2 node cluster.

Cheers,
Raoul

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] slapd RA does not start OpenLDAP server after reboot

2013-01-23 Thread Raoul Bhatia [IPAX]


On 2013-01-23 14:11, Dejan Muhamedagic wrote:

I fixed slapd startup problem by changing resource agent script:
--- slapd.orig  2013-01-22 17:23:42.266314000 +0400
+++ slapd   2013-01-22 17:23:12.094422000 +0400
@@ -299,6 +299,7 @@ slapd_start()
   local reason
   local result
   local state
+  local pid_dir

   slapd_status `slapd_pid`; state=$?

@@ -324,6 +325,12 @@ slapd_start()
 options="$options $parameters"
   fi

+  pid_dir="/var/run/slapd"
+  if [ ! -d $pid_dir ]; then
+ mkdir -p $pid_dir
+ chown ldap:ldap $pid_dir
+  fi
+
   if [ -n "$services" ]; then
 $slapd -h "$services" $options 2>&1; result=$?
   else


Similar code is already executed in slapd_validate_all. [...]


woops, i missed that ;)

Cheers,
Raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] slapd RA does not start OpenLDAP server after reboot

2013-01-23 Thread Raoul Bhatia [IPAX]


On 2013-01-23 13:02, Igor Zinovik wrote:

Hello.

I have a pacemaker cluster with 2 nodes (opensuse 12.1 and 12.2).  On
both nodes slapd does not get started by pacemaker, but it could be 
started by

launching `systemctl start ldap.service'.

The problem is that something unlinks `/var/run/slapd' directory after
reboot or shutdown, that is why slapd cannot save its pid file and 
cannot start.

[...]


The reason might be that /var/run/ is a tmpfs mount [1].
(please verify using "mount")


I fixed slapd startup problem by changing resource agent script:
--- slapd.orig  2013-01-22 17:23:42.266314000 +0400
+++ slapd   2013-01-22 17:23:12.094422000 +0400
@@ -299,6 +299,7 @@ slapd_start()
   local reason
   local result
   local state
+  local pid_dir

   slapd_status `slapd_pid`; state=$?

@@ -324,6 +325,12 @@ slapd_start()
 options="$options $parameters"
   fi

+  pid_dir="/var/run/slapd"
+  if [ ! -d $pid_dir ]; then
+ mkdir -p $pid_dir
+ chown ldap:ldap $pid_dir
+  fi
+
[...]


Thanks for the input. Unfortunately, this will only work for *your*
setup. The pidfile could reside in another directory, for example.

Also, other distributions (e.g. Debian Squeeze) uses the "openldap" 
user and group.


It would be more sane to handle it with something like


local pid_dir
pid_dir=`dirname $OCF_RESKEY_pidfile`
if [ ! -d $pid_dir ] ; then
ocf_log info "Creating PID dir: $pid_dir"
mkdir -p $pid_dir
chown $OCF_RESKEY_user:$OCF_RESKEY_group $pid_dir
fi


(untested)

Cheers,
Raoul

[1] https://features.opensuse.org/303793
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Fencing of KVM virtual machines

2012-10-02 Thread Raoul Bhatia [IPAX]

On 2012-10-02 07:56, Masopust, Christian wrote:

Hi all,
I've running several pacemaker clusters in KVM virtual machines
(everything based on
Debian 6) and now it's up to configure fencing...
I've found that I have to use "fence-virt" for that task
(http://www.clusterlabs.org/wiki/Guest_Fencing)
but it seems that it only will work in case my VMs are on a single host
system.
Is anybody of you using fence-virt when the VMs are on different hosts?
And can anybody explain how to compile and install fence-virt on a
Debian 6 system?
Thanks a lot!

We use:

> primitive stonith-virtNodeY stonith:external/libvirt \
>   meta failure-timeout="1min" \
>   op monitor interval="60" \
>   params hostlist="virtNodeY" 
hypervisor_uri="qemu://xxx.ipax.in/system"

You might need some ssh keys installed to allow passwordless
login, but i don't know for certain anymore.
Maybe, it directly talks to the hypervisor via a dedicated port.

Cheers,
Raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] migration only between nodes with identical hardware

2012-10-01 Thread Raoul Bhatia [IPAX]


On 2012-09-28 16:24, James Harper wrote:

I have two nodes running identical hardware which run Xen VM's, and want to add 
a third node to the cluster which can access the same clvm and iscsi resources, 
but it will not be identical hardware. The non-identical hardware means that to 
move a VM to this third node it it must be stopped then started, a migration 
will not work.

This situation may not really come up as in most cases I'll use location 
resources to restrict VM's to only the first two nodes (third node will mostly 
be for a different purpose), but just in case I want to do that for one or two 
VMs, is it possible to come up with some sort of rule like:

A->B = migration allowed
B->A = migration allowed
A->C = no migration allowed
B->C = no migration allowed


create an asymmetrical cluster.
add a node property, e.g.

> node xxx attributes service="web"

create corresponding location rules, e.g.


location loc-web-fs-www web-fs-www \
rule $id="loc-web-webfs-www-rule" 100: service eq web


Cheers,
Raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Current Release Schedule (was: Re: ordering such that at least one resource started)

2012-09-15 Thread Raoul Bhatia [IPAX]

On 2012-09-11 09:57, Andrew Beekhof wrote:

The feature will be released for the first time in 1.1.8
The shell might not support it yet, you might need to use the xml until then

Andrew, when will the release of 1.1.8 approximately be taking place?

http://www.clusterlabs.org/wiki/ReleaseCalendar states:
> 1.1.8  2012, Jul   
> 1.1.9  2012, Oct

(I've seen that you've already update the changelog on Sept. 6th)

I'm not actually affected by the current bugfixes, at least i think so
;), but it would be nice to know the current schedule.

Of course, a "thank you" for the work you and the others do is long
overdue from my side so: Thank you to everyone who has helped
in making such a great software!

Cheers,
Raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Percona MySQL RA on MySQL 5.5 problem

2012-09-15 Thread Raoul Bhatia [IPAX]


On 2012-09-14 20:20, Nathan Bird wrote:

I tried searching a bit and it seems like this one hasn't been reported
yet, my apologies if it has.

The RA currently issues a "RESET SLAVE" command, but the meaning of this
changed in 5.5. https://dev.mysql.com/doc/refman/5.5/en/reset-slave.html

It now needs to do a "RESET SLAVE ALL".


There has been a patch from bmildren [1] and a corresponding discussion.
I, however, am not using MySQL 5.5 and did not find a chance to test
it properly.

Feel free to submit a patch or pull request for this issue
and/or comment on the commit [1] so that we can include it in
the next resource agents release.

Thanks,
Raoul

[1] 
https://github.com/bmildren/resource-agents/commit/72031e5a6a644ce6d9a1ed2ec31d4dfb9bae294b


DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] VirtualDomain shutdown problem

2012-09-12 Thread Raoul Bhatia [IPAX]

On 2012-09-12 16:38, Luca Meron wrote:

What do you mean by "a VirtualDomain KVM machine which doesn't
support shutdown" ?

the VM doesn't have acpid.

What version of the resource/cluster agents are you using?

3.9.2

Thanks,

ok, i think i cannot help you then.
But you could try to manually set rsyslog to shutdown *after*
corosync/pacemaker and capture the logs and/or enable logging
into seperate files in /etc/corosync/corosync.conf,

something like:
> to_stderr: yes
> to_logfile: yes
> to_syslog: yes

for the exact configuration, you might need to read the manual.

Cheers,
Raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] VirtualDomain shutdown problem

2012-09-12 Thread Raoul Bhatia [IPAX]


On 2012-09-11 12:52, Luca Meron wrote:

I've an installation with a VirtualDomain KVM machine which doesn't support 
shutdown, so it has to be destroyed.
 From what I see VirtualDomain should first issue a shutdown and after the 
timeout destroy the vm, but I'm using Pacemaker 1.1.6 and I found this thread
http://www.gossamer-threads.com/lists/linuxha/pacemaker/78969
which says there's a bug in this version.


What do you mean by "a VirtualDomain KVM machine which doesn't support
shutdown" ?

What version of the resource/cluster agents are you using?

Cheers,
Raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] node offline after fencing (pacemakerd hangs)

2012-07-19 Thread Raoul Bhatia [IPAX]

On 2012-07-19 16:05, Jake Smith wrote:
> Another solution is something like (will vary a little in RHEL I believe):
> 
> Disable corosync autostart
> $sudo update-rc.d -f corosync disable S 
> 
> add 'post-up /etc/init.d/corosync start' to bonding (or in your case bridged) 
> interface in 
> /etc/network/interfaces.

Just for the records:
If you're using pacemakerd with a seperate init file,
you will have to consider this one too.

Cheers,
Raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] node offline after fencing (pacemakerd hangs)

2012-07-18 Thread Raoul Bhatia [IPAX]

On 2012-07-18 15:57, Ulrich Leodolter wrote:
> hi,
> 
> after adding a second ring to corosync.conf
> the problem seems to be gone.
> 
> after killing corosync the node is fenced by
> the other node.  after reboot the cluster is
> fully operational.
> 
> is this essential to have at least 2 rings?
> 
> maybe there is a network timing problem (but can't see
> error messages)
> the interface on ring 0 (192.168.20.171) is a bridge.
> the interface on ring 1 (10.10.10.171) is normal ethernet interface.

I've seen such things with bonding devices under debian 6.0

try something like:
> auto bond0
> iface bond0 inet static
...
>
bond-mode active-backup
bond-miimon 100
bridge_fd 0
bridge_maxwait 0

Another workaround is a "sleep 10" or similar at the beginning
of the pacemaker script to let bond0 come up.

We always go with 2 rings, even when using a NIC bonding.

Cheers,
Raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-20 Thread Raoul Bhatia [IPAX]


On 2012-05-20 12:05, Christoph Bartoschek wrote:

Hi,

we have a two node setup with drbd below LVM and an Ext4 filesystem that is
shared vi NFS. The system shows low performance and lots of timeouts
resulting in unnecessary failovers from pacemaker.

The connection between both nodes is capable of 1 GByte/s as shown by iperf.
The network between the clients and the nodes is capable of 110 MByte/s. The
RAID can be filled with 450 MByte/s.

Thus I would expect to have a write performance of about 100 MByte/s. But dd
gives me only 20 MByte/s.

dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
1310720+0 records in
1310720+0 records out
10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s


to give you some numbers to compare:

I've got a small XFS file system, which i'm currently testing with.
Using a single thread and NFS4 only:

my configuration:
nfsserver:
# exportfs -v
/data/export 
192.168.100.0/24(rw,wdelay,no_root_squash,no_subtree_check,fsid=1000)



nfsclient mount
192.168.100.200:/data/export on /mnt type nfs 
(rw,nosuid,nodev,nodiratime,relatime,vers=4,addr=192.168.100.200,clientaddr=192.168.100.107)


via network (1gbit connection for both drbd sync and nfs)
  # dd if=/dev/zero of=bigfile.10G bs=6192  count=1310720
  1310720+0 records in
  1310720+0 records out
  8115978240 bytes (8.1 GB) copied, 140.279 s, 57.9 MB/s

on the same machine so that 1gbit is for drbd only:
  # dd if=/dev/zero of=bigfile.10G bs=6192  count=1310720
  1310720+0 records in
  1310720+0 records out
  8115978240 bytes (8.1 GB) copied, 70.9297 s, 114 MB/s

Maybe this numbers and configuration helps?

Cheers,
Raoul


While the slow dd runs there are timeouts on the server resulting in a
restart of some resources. In the logfile I also see:

[329014.592452] INFO: task nfsd:2252 blocked for more than 120 seconds.
[329014.592820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[329014.593273] nfsdD 0007 0  2252  2
0x

...

Has anyone an idea what could cause such problems? I have no idea for
further analysis.


i haven't seen such issue during my current tests.


Is ext4 unsuitable for such a setup? Or is the linux nfs3 implementation
broken? Are buffers too large such that one has too wait too long for a
flush?


Maybe I'll have the time to switch form xfs to ext4 and retest
during the next couple of days. But I cannot guarantee anything.

Maybe you could try switching to XFS instead?

Cheers;
Raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] RES: [Openais] Help on mysql-proxy resource

2012-05-15 Thread Raoul Bhatia [IPAX]


On 2012-05-10 13:07, Raoul Bhatia [IPAX] wrote:

On 2012-05-02 00:30, Carlos xavier wrote:

most probably, this (the plural s) is the configuration error.
if you're experiencing further issues, please give me a shout
(off-list).


Unfortunately that wasn’t the solution. I fixed the typo and the trouble
remains, the parameter isn´t acceped by the proxy daemon.


Please provide me with:
* the exact version of the resource agent (maybe even md5 hash?),
* mysql-proxy --version
* mysql-proxy --help-all
* the errors of the mysql-proxy resource from the syslog file.


Please test using the latest ra from
https://github.com/raoulbhatia/resource-agents/blob/e373012291f659660cff77e44d00daf1930c0465/heartbeat/mysql-proxy

and please let me know if it works or not.

if not, please provide the mentioned information / logs.

thanks,
raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] RES: [Openais] Help on mysql-proxy resource

2012-05-10 Thread Raoul Bhatia [IPAX]


On 2012-05-02 00:30, Carlos xavier wrote:

most probably, this (the plural s) is the configuration error.
if you're experiencing further issues, please give me a shout (off-list).


Unfortunately that wasn’t the solution. I fixed the typo and the trouble
remains, the parameter isn´t acceped by the proxy daemon.


Please provide me with:
* the exact version of the resource agent (maybe even md5 hash?),
* mysql-proxy --version
* mysql-proxy --help-all
* the errors of the mysql-proxy resource from the syslog file.

Cheers,
Raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] [Openais] Help on mysql-proxy resource

2012-04-09 Thread Raoul Bhatia [IPAX]


On 2012-03-30 09:43, Tim Serong wrote:

On 03/30/2012 12:52 PM, Carlos xavier wrote:

Hi.

I have mysql-proxy running on my system and I want to agregate it to the
cluster configuration.
When it is started by the system I got this as result of ps auwwwx:

root 29644  0.0  0.0  22844   844 ?S22:37   0:00
/usr/sbin/mysql-proxy --pid-file /var/run/mysql-proxy.pid --daemon
--proxy-lua-script


Note this is --proxy-lua-script (singular)


/usr/share/doc/packages/mysql-proxy/examples/tutorial-basic.lua
--proxy-backend-addresses=10.10.10.5:3306 --proxy-address=172.31.0.192:3306

So I created the following configuration at the CRM:

primitive mysql-proxy ocf:heartbeat:mysql-proxy \
 params binary="/usr/sbin/mysql-proxy"
pidfile="/var/run/mysql-proxy.pid" proxy_backend_addresses="10.10.10.5:3306"
proxy_address="172.31.0.191:3306" parameters="--proxy-lua-scripts
/usr/share/doc/packages/mysql-proxy/examples/tutorial-basic.lua" \


This is --proxy-lua-scripts (plural).  I'm guessing maybe that's the
problem.


most probably, this (the plural s) is the configuration error.
if you're experiencing further issues, please give me a shout
(off-list).

cheers,
raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Modify resource agent.

2011-11-18 Thread Raoul Bhatia [IPAX]


On 2011-11-18 01:23, Mark Gardner wrote:

I'm using a commercial cluster aware filesystem appliance called
Panasas.  The OCF:Heartbeat:Filesystem resource agent politely declined
to mount the filesystem claiming that it was not a cluster aware
filesystem.

For the time being I simply made a copy of the Filesystem resource agent
and added the file system type (panfs) alongside gfs, nfs, ocfs2.

Would it be appropriate to submit a patch/enhancement request to have
this added to the resource agent?  How would I go about submitting it?

I'd like to keep getting any future updates to the Resource Agent and
have the panfs filesystem available for use.


i haven't heard about panfs but i think that the patch should be rather
trivial, right?

so feel free to post it to the list.

cheers,
raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Syntax highlighting in vim for crm configure edit

2011-11-15 Thread Raoul Bhatia [IPAX]


hi!

On 2011-08-19 16:28, Dan Frincu wrote:

Hi,

On Thu, Aug 18, 2011 at 5:53 PM, Digimer  wrote:

On 08/18/2011 10:39 AM, Trevor Hemsley wrote:

Hi all

I have attached a first stab at a vim syntax highlighting file for 'crm
configure edit'

To activate this, I have added 'filetype plugin on' to my /root/.vimrc
then created /root/.vim/{ftdetect,ftplugin}/pcmk.vim

In /root/.vim/ftdetect/pcmk.vim I have the following content

au BufNewFile,BufRead /tmp/tmp* set filetype=pcmk

but there may be a better way to make this happen. /root/.vim/pcmk.vim
is the attached file.

Comments (not too nasty please!) welcome.


I've added a couple of extra keywords to the file, to cover a couple
more use cases. Other than that, great job.


will this addition make it into some package(s)?
would it be right to ship this vim syntax file with crm?

thanks,
raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Master/Master Setup

2011-11-14 Thread Raoul Bhatia [IPAX]


On 2011-10-22 20:33, Michael Marrotte wrote:

I have a 2-node MySQL master/slave to master/slave setup and am looking
to add a MySQL "health" check that can take a node out of the cluster if
its slave is unhealthy, e.g. out of sync.  The master/slave master/slave
config is working fine, hence writes in either node are replicated to
the other node.  To add Pacemaker into the mix, I configure ha.cf
 appropriately, shutdown MySQL and configure Pacemaker
resources as follows:


hi!

i did not find the time to reply to your questions.
did you manage to configure your mysql ha cluster?

cheers,
raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Mysqlping resource for Mysql Master-Master replication

2011-11-14 Thread Raoul Bhatia [IPAX]

On 2011-11-14 16:40, Viacheslav Biriukov wrote:

cloned resource separatelyю For example restart cloned Mysql server on
the first ndoe. But don't touch it on the second? It's no clear for me
from the documentation.

yes you can:
> crm resource stop clone:0 or
> crm resource cleanup clone:0

moreover, you could create constraints for that too:
> crm resource move clone:0

there is an issue though: by default, you don't know which clone
instance (read :0, :1, etc.) are running on which node.

however, it should be possible to add rules so that :0 will prefer
node01 and :1 will prefer node02

cheers,
raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Replication MYSQL does not work in pacemaker

2011-11-14 Thread Raoul Bhatia [IPAX]


On 2011-11-07 21:42, Joe wrote:

Hi
I try to configure simple two nodes(Master/Slave) MYSQL/DRBD/PACEMAKER
with MYSQL replication but it does not work. Everytime I start
mysql(M/S) RA,it won't start with these error below. PLease help if you
see if I am doing anything wrong ( attached crm configure show). thank you.

( Failed actions:
p_mysql:0_monitor_0 (node=mysqldrbd02, call=501, rc=5, status=complete):
not installed
p_mysql:1_monitor_0 (node=mysqldrbd01, call=251, rc=5, status=complete):
not installed

Reconnecting...[root@mysqldrbd01 etc]# tail -f /var/log/messages


use "grep mysql: /var/log/messages" (or similar) to search for the
error. the NOT_INSTALLED error will output an error message to syslog.

cheers,
raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Pure-FTPd resource script is wrong (at least on ubuntu and probably debian)

2011-11-13 Thread Raoul Bhatia [IPAX]


hi racke!
hi nicola!

On 12.11.2011 14:07, Mailing List SVR wrote:

So the proposed changes for Debian package are:

* move directory creation into wrapper script
* make location of PID file configurable

Please comment if I got that right.


yes that would be another good solution,


imho, this is correct.

thanks,
raoul


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Pure-FTPd resource script is wrong (at least on ubuntu and probably debian)

2011-11-11 Thread Raoul Bhatia [IPAX]

hi!

as i'm quote familiar with the pure-ftpd script, i'm stepping into
this conversation.

On 2011-11-10 18:40, Mailing List SVR wrote:

1) I have this primitive:

primitive ftp_service ocf:heartbeat:Pure-FTPd \
params script="/usr/sbin/pure-ftpd-wrapper" daemon_type="postgresql"
pidfile="/var/run/pure-ftpd/pure-ftpd.pid" \
op monitor interval="40s" timeout="20s" \
op start interval="0" timeout="20s" \
op stop interval="0" timeout="20s"

looks good.

please note that I'm passing pidfile parameter to the RA script and that
on debian/ubuntu pidfile is hardcoded to
"/var/run/pure-ftpd/pure-ftpd.pid" so this is the only available choice

what do you mean by that? where is the pidfile hardcoded?

you mean in /usr/sbin/pure-ftpd-wrapper line 172?

well, then this is, imho, the problem of debian/ubuntu's warpper
script not creating the correct directory.

> # force PID file to /var/run/pure-ftpd/pure-ftpd.pid
> push(@options, '-g', '/var/run/pure-ftpd/pure-ftpd.pid');

/var/run/pure-ftpd/ is created by the init script "on the fly".
this patch comes from http://bugs.debian.org/506077 and might as well
be replaced/complemented by additional checks in the, debian specific,
wrapper script.

you better send a corresponding bugreport - or even better a patch -
to Stefan Hornburg (Racke)  and/or the debian/ubuntu
bugtracker.

there is "nothing" that the resource agent can be blamed for.

the directory /var/run/pure-ftpd is deleted every time I reboot the
server and I don't like to create it in rc.local

btw, is this a supported/common configuration? i've never used this
approach

cheers,
raoul

ps. stefan, this discussion is about debian bug 506077 and the hard
coded pidfile in /usr/sbin/pure-ftpd-wrapper and resulting implications
for the "resource agents" in a linux-ha environment.
feel free to directly contact me offlist if you need any more
information on this.

this discussion's thread can be found at
http://www.gossamer-threads.com/lists/linuxha/pacemaker/76107
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Master/Master Setup

2011-10-22 Thread Raoul Bhatia [IPAX]


hi!

On 22.10.2011 20:33, Michael Marrotte wrote:

primitive p_mysql ocf:heartbeat:mysql \
 params binary="/usr/sbin/mysqld" config="/etc/mysql/my.cnf"
replication_user="slaveuser" replication_passwd="slavepw"
test_passwd="root" pid="/var/run/mysqld/mysqld.pid"
socket="/var/run/mysqld/mysqld.sock" \
 params additional_parameters="--skip-slave-start" \
 op start interval="0" timeout="120" \
 op stop interval="0" timeout="120" \
 op promote interval="0" timeout="120" \
 op demote interval="0" timeout="120" \
 op monitor interval="30" timeout="30" OCF_CHECK_LEVEL="1"
ms ms_mysql p_mysql \
 meta notify="true" master-max="2" clone-max="2" target-role="Started"
property $id="cib-bootstrap-options" \
 dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
 cluster-infrastructure="Heartbeat" \
 stonith-enabled="false" \
 no-quorum-policy="ignore" \
 last-lrm-refresh="1319301867"


master-max should be set to 1 if you're using a master-slave mysql
setup.

you can leave most replication related configuration out of your
mysql config files. except for the server-id i think.


When I start ms_sql, I'm getting the following:

Failed actions:
 p_mysql:0_start_0 (node=vsaas-test-sql-1, call=73, rc=1,
status=complete): unknown error
 p_mysql:1_start_0 (node=vsaas-test-sql-2, call=63, rc=1,
status=complete): unknown error


do all of the above files/paths exist?
my configuration looks this:

primitive wdb-mysql ocf:heartbeat:mysql \
op monitor interval="30" timeout="30" \
op monitor interval="300" timeout="30" OCF_CHECK_LEVEL="10" \
op monitor interval="301" role="Master" timeout="30" 
OCF_CHECK_LEVEL="10" \
op monitor interval="31" role="Slave" timeout="30" 
OCF_CHECK_LEVEL="10" \

op monitor interval="15" role="Slave" timeout="30" \
op monitor interval="10" role="Master" timeout="30" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120" \
params config="/etc/mysql/my.cnf" 
datadir="/data/db/mysql/data/" socket="/var/run/mysqld/mysqld.sock" 
binary="/usr/sbin/mysqld" additional_parameters="--basedir=/usr 
--skip-external-locking --log-bin=/data/db/mysql/log/mysql-bin.log 
--relay-log=mysqld-relay-bin " pid="/var/run/mysqld/mysqld.pid" 
test_table="nagiostest.test_table" test_user="nagios" test_passwd="" 
replication_user="ruser" replication_passwd="rpass"


which version of the mysql ra/which resource agent release do you use?

and please provide the log files! otherwise, its hard to correctly
diagnose the problem.

cheers,
raoul

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Upgrading the MySQL RA

2011-10-22 Thread Raoul Bhatia [IPAX]


On 22.10.2011 12:35, Michael Marrotte wrote:

What's the correct way to upgrade an RA script, in particular, the MySQL RA?

I've tried replacing /usr/lib/ocf/resource.d/heartbeat/mysql (and chmod
+x) with the latest version found at github, but then get the following:

lrmadmin[1470]: 2011/10/22_03:08:37 ERROR:
lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
message of rmetadata with function get_ret_from_msg.
ERROR: ocf:heartbeat:mysql: could not parse meta-data:
ERROR: ocf:heartbeat:mysql: no such resource agent


maybe it is related to the first "includes" of ocf-shellfuncs and the
like. there has been a change in the filenames/include paths which
you might need to revert.

cheers,
raoul


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Mysqlping resource for Mysql Master-Master replication

2011-10-19 Thread Raoul Bhatia [IPAX]


On 2011-06-12 23:10, Viacheslav Biriukov wrote:

We have the situation when our  mysql  server hangs. So it begins to
restart. But it takes a while. So we need fast switching for the other
mysql master node.


hi!

do i correctly understand that this is only beneficial in a master-
master setup? and/or why is the regular monitoring and dependency
handling not enough?

thanks,
raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Issue with clusterlab mysql ocf script

2011-10-19 Thread Raoul Bhatia [IPAX]


we might need marek on this one because i did not implement
the master/slave logic and am not using it for multiple slaves.

marek, can you please comment on this?

thanks,
raoul

On 2011-08-29 23:08, Michael Szilagyi wrote:

Did some more testing and figured I would add that even Slave resources
rejoin the cluster as a Master role briefly before switching back to
Slave.  Of course, since the mysql RA uses event notification this still
has the effect of unsetting all masters whenever a new node joins.
  Since a master role is possibly configured already, the pre-premote
notification event doesn't get fired again and replication remains
broken.  It seems likely that I must be doing something wrong since this
would be a pretty normal use case and completely breaks the mysql
replication cluster.

Thoughts anyone?


On Fri, Aug 26, 2011 at 10:19 AM, Michael Szilagyi mailto:mszila...@gmail.com>> wrote:

I'm having a problem with master/slave promotion using the most
recent version of the mysql ocf script hosted off the
clusterLabs/resource-agents github repo.

The script works well failing over to a slave if a master looses
connection with the cluster.  However, when the master rejoins the
cluster the script is doing some undesirable things.  Basically, if
the master looses connection (say I pull the network cable) then a
new slave is promoted and the old master is just orphaned (which is
fine, I don't have STONITH setup yet or anything).  If i plug that
machine's cable back in then the node rejoins the cluster and
initially there are now two masters (the old, orphaned one and the
newly promoted one).  Pacemaker properly sees this and demotes the
old master to a slave.

After some time debugging the ocf I think what is happening is that
the script sees the old master join and fires off a post-demote
notification event for the returning master which causes a
unset_master command to be executed.  This causes all the slaves to
remove their master connection info.  However, since the other
master server has already been promoted and is (to its mind) already
replicating to the other slaves in the cluster, a new pre-promote is
never fired which means that the slaves do not get a new CHANGE
MASTER TO issued so I wind up with a broken replication setup.

I'm not sure if I'm missing something in how this is supposed to be
working or if this is a limitation of the script.  It seems like
there must be either a bug or something I've got setup wrong,
however, since it's not all that unlikely that such a scenario could
occur.  If anyone has any ideas or suggestions on how the script is
supposed to work (or what I may be doing wrong) I'd appreciate some
ideas.

I'll include the output of my crm configure show in case it'll be
useful:

node $id="a1a3266d-24e2-4d1b-bfd7-de3bac929661" seven \
attributes 172.17.0.130-log-file-p_mysql="mysql-bin.05"
172.17.0.130-log-pos-p_mysql="865"
172.17.0.131-log-file-p_mysql="mysql-bin.38"
172.17.0.131-log-pos-p_mysql="607"
four-log-file-p_mysql="mysql-bin.40" four-log-pos-p_mysql="2150"
node $id="cc0227a2-a7bc-4a0d-ba1b-f6ecb7e7d845" four \
attributes 172.17.0.130-log-file-p_mysql="mysql-bin.05"
172.17.0.130-log-pos-p_mysql="865"
three-log-file-p_mysql="mysql-bin.22" three-log-pos-p_mysql="106"
node $id="d9d3c6cb-bf60-4468-926f-d9716e56fb0f" three \
attributes 172.17.0.131-log-file-p_mysql="mysql-bin.38"
172.17.0.131-log-pos-p_mysql="607" three-log-pos-p_mysql="4"
primitive p_mysql ocf:heartbeat:mysql \
params binary="/usr/sbin/mysqld" config="/etc/mysql/my.cnf" \
params pid="/var/lib/mysql/mySQL.pid"
socket="/var/run/mysqld/mysqld.sock" \
params replication_user="sqlSlave" replication_passwd="slave" \
params additional_parameters="--skip-slave-start" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120" \
op promote interval="0" timeout="120" \
op demote interval="0" timeout="120" \
op monitor interval="5" role="Master" timeout="30" \
op monitor interval="10" role="Slave" timeout="30"
ms ms_mysql p_mysql \
meta master-max="1" clone-max="3" target-role="Started"
is-managed="true" notify="true" \
meta target-role="Started"
property $id="cib-bootstrap-options" \
dc-version="1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
last-lrm-refresh="1314307995"

Thanks!




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



--
___

Re: [Pacemaker] Mysql Replication Problem with HA processes

2011-10-19 Thread Raoul Bhatia [IPAX]


On 2011-10-19 15:37, Raoul Bhatia [IPAX] wrote:

Please try the latest revision of this ra and get back to me if this
problem is still existing.

https://github.com/raoulbhatia/resource-agents/raw/master/heartbeat/mysql


in case you're using an old version of the ocf-functions,
you might need to either

1. update the ra to use the correct paths (line 44f)
2. deploy the new ocf-shellfuncs file.

cheers,
raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Mysql Replication Problem with HA processes

2011-10-19 Thread Raoul Bhatia [IPAX]


On 2011-10-07 10:34, raki wrote:

Hi Andrew


Hi Andrew

We had installed Heartbeat/Pacemaker on two Unix(Centos 5.0.2 Machines)

The RPM's we had used for installing Heartbeat ad Pacemaker are

heartbeat-3.0.3-2.3.el5.i386.rpm
pacemaker-1.0.9.1-1.15.el5.i386.rpm

Please find the crm(cluster resource manager) order  and co-location we
had used.

colocation Httpd-with-Mysql inf: HttpdVIP MS_Mysql:Master
colocation Httpd-with-ip inf: HttpdVIP Httpd
colocation Mysql-with-Tomcat inf: Tomcat1 MS_Mysql:Master
colocation Tomcat-with-HttpdVIP inf: Tomcat1 HttpdVIP
order Httdp-after-HttpdVIP inf: HttpdVIP Httpd
order Httdp-after-tomcat1 inf: Httpd Tomcat1
order MYSQL-after-HttpdVIP inf: MS_Mysql HttpdVIP

we had two nodes running HA processes (cluster with two nodes)

In on of our test scenario we tried to restart the node where
Mysql-Master is running, based on the above configuration the Ha
processes restarts and other node Mysql processes takes over the Master
responsibility

And we saw the Mysql replication stops working and in the Slave status
we found this error duplicate entry for the key values.

Please help us regarding this.
  Error Description 'Duplicate entry '2083' for key 1' on query. Default
database: 'MSF_DB'.


Please try the latest revision of this ra and get back to me if this
problem is still existing.

https://github.com/raoulbhatia/resource-agents/raw/master/heartbeat/mysql

thanks,
raoul
--

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] SNMP monitoring

2011-07-06 Thread Raoul Bhatia [IPAX]

On 07/06/2011 04:17 PM, Proskurin Kirill wrote:
> On 07/05/2011 12:05 PM, Raoul Bhatia [IPAX] wrote:
>> Proskurin, if you get snmp working, would you kindly post your
>> configuration to the mailinglist?
>>
>> the snmp-topic has popped up several times and it would be nice if
>> we got a working config in the mailinglist archive - or better: in the
>> wiki - as a reference.
> 
> Ok I get it.
> 
> You need:
> snmptrapd
> pacemaker with snmp support

*snip*

thank you for your reply! i'll try that when i find some time ...
(only god knows when this will be though ...)

thanks!
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] SNMP monitoring

2011-07-05 Thread Raoul Bhatia [IPAX]

On 07/05/2011 03:42 AM, Andrew Beekhof wrote:
> On Tue, Jul 5, 2011 at 2:04 AM, Proskurin Kirill
>  wrote:
>> Hello all.
>>
>> Im try to figure out how to monitor cluster via SNMP.
>> I understand what I need to use crm_mon -S snmpdtrap-ip but I kind of new at
>> SNMP and still can`t understand how to get it work.
>>
>> Could someone write a simple example? Like a snmpdtrapd config.
>> Or may be more detail one to put it into a pacemaker docs(SNMP chapter is
>> empty there)?
> 
> I had it working once, but it was a long time ago and I no longer have
> the relevant configs.
> I'd suggest getting basic snmp functionality going before involving pacemaker.

Proskurin, if you get snmp working, would you kindly post your
configuration to the mailinglist?

the snmp-topic has popped up several times and it would be nice if
we got a working config in the mailinglist archive - or better: in the
wiki - as a reference.

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] active cluster and load-balancer

2011-06-28 Thread Raoul Bhatia [IPAX]


On 29.06.2011 05:28, Arizka Ashar wrote:

Hello All,

in active-active each node have clone vip..

node1 = 1.2.3.4
node2 = 5.6.7.8
node3 = 9.10.11.12
clone vip = 13.14.15.16


honestly, that depends on your network setup and
on the lvs forwarding type.
(e.g. are the node's real ips routable?)

what i would do:
1. configure the clone vip using a tun device on the real servers
   (pay attention to the arp configuration!)
   (13.14.15.16 = tunl0)

2. add the vip to the load balancers.
   make this setup redundant using pacemaker.
   (floating vip, ldirectord, etc.)
   (lb ip = x.x.x.x, y.y.y.y; 13.14.15.16 = floating vip, not cloned)

3. point the lvs setup to the node's ips
   (1.2.3.4, 5.6.7.8, 9.10.11.12)

a similar setup is running at my company
(except for the tunl devices which are not managed by pacemaker)

cheers,
raoul

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] how to find on which node the resource is located

2011-06-15 Thread Raoul Bhatia [IPAX]

On 06/15/2011 10:22 AM, Janec, Jozef wrote:
> Hello All,
> 
> I'm trying configure solution where one resource will be able know where is 
> the second located.
> I have configured one easy resource
> 
> 
> 
> ...

just out of curiosity, why are you not using the crm shell?
thanks to dejan et al, it makes life much easier!

can you send the output from "crm configure show" too?

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Pacemaker / Postfix startup problem...

2011-06-03 Thread Raoul Bhatia [IPAX]

On 06/03/2011 12:16 PM, Raoul Bhatia [IPAX] wrote:
> On 04/19/2011 01:45 PM, Adam Reiss wrote:
>> I'll get a chance to work on it today.  I'll let you know what happens.
>> :)
> 
> adam, could you please test
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/postfix

ok - nevermind. i see that you're using postfix 2.3.3 which means
that you're also hit by the "postfix status" issue from recent
discussions.

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Pacemaker / Postfix startup problem...

2011-06-03 Thread Raoul Bhatia [IPAX]

On 04/19/2011 01:45 PM, Adam Reiss wrote:
> I'll get a chance to work on it today.  I'll let you know what happens.
> :)

adam, could you please test
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/postfix

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] mysql RA fixes merged

2011-05-30 Thread Raoul Bhatia [IPAX]

On 05/30/2011 04:18 PM, Florian Haas wrote:
> Hello,
> 
> I've merged and pushed a number of fixes to master/slave replication in
> the mysql RA, contributed by Marek Marczykowski. I've deliberately left
> out Raoul Bhatia's retab patch out though, those "janitor" patches
> usually make debugging harder if we do run into issues. We can always
> merge that patch later.
> 
> I've also fixed the commit messages to be prefixed with "mysql". Marek,
> could you please rebase your github repo against current upstream.
> Thanks for the contribution!

thank you for your work!
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] postfix resource agent using unsupported "status" command

2011-05-20 Thread Raoul Bhatia [IPAX]

On 05/20/2011 12:23 AM, Noah Rømer wrote:
> The pacemaker resource agent (downloaded from
> https://github.com/raoulbhatia/resource-agents/blob/master/heartbeat/postfix)
> has a function called "running" that attempts to determine if the
> requested postfix instance is running by doing:
> 
> "$binary $OPTION_CONFIG_DIR status >/dev/null 2>&1"
> 
> This causes the resource agent to fail on starting the service. The
> postfix command says:
> 
> postfix/postfix-script: fatal: usage: postfix start (or stop, reload,
> abort, flush, check, set-permissions, upgrade-configuration)
> 
> It doesn't appear to support the status command, the way the init
> script (/etc/init.d/postfix) does.

"postfix status" is provided by /etc/postfix/postfix-script
i today noticed that it has been added in postfix 2.5:

> status)
> 
> $daemon_directory/master -t 2>/dev/null && {
> $INFO the Postfix mail system is not running
> exit 1
> }
> $INFO the Postfix mail system is running: PID: `sed 1q pid/master.pid`
> exit 0
> ;;

it should be possible to implement something like this in the ra as a
fallback mechanism (i prefer relying on postfix-script's checks if the
environment is correctly set up)


out of curiosity: which version of postfix do you run and why?

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] mysql m/s failover: 'Could not find first log file name in binary log index file'

2011-04-19 Thread Raoul Bhatia [IPAX]

On 04/19/2011 10:38 AM, Marek Marczykowski wrote:
>> in your opintion, is it possible to fix this via the ocf ra or does it
>> have to be a separate cronjob?
> 
> I haven't idea how to do it in ra. There is no easy way to look what
> binlogs are on the other node. Maybe some tricks storing that info on
> monitor action, but this is ugly and makes ra depending on monitor
> action enabled...
> The easiest solutions are the best :)

what about submitting a "show master logs" query to the to-be master,
checking the available logs and refusing to start if the log-file
disappeared?

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Pacemaker / Postfix startup problem...

2011-04-19 Thread Raoul Bhatia [IPAX]

adam, any news on this?
if this is not working for you, i've got another idea.
but please report the current status first...

thanks,
raoul

On 04/14/2011 08:33 PM, Raoul Bhatia [IPAX] wrote:
> hi adam,
> 
> On 14.04.2011 18:10, Adam Reiss wrote:
>> Hi Raoul,
>>
>> We're trying to setup a HA SMTP Relay, so having pacemaker stop/start
>> the services as it passes the work over to the other machine, should
>> Postfix fail...  Is there a better way to allow an HA SMTP relay?
> 
> when we're setting up a clustered postfix, we do not mess with the
> default /etc/postfix/ config but use a different location on a drbd
> backed deviced instead.
> 
> e.g. /data/mail/
> 
> this way, local mail deliverey (cron output!) works without any issue -
> even if the clustered postfix is down (e.g. for maintenance) or simply
> migrated to a different host.
> 
>> It's running under VMWare, having two different guests, on two different
>> hosts...
>>
>> I've attached the output you've requested. :)
>>
>> There is no syslog file in /var/log .
> 
> mhm - your hb_report is incomplete too. i don't know centos - where does
> centos' syslog write it's logfiles?
> 
> anyways, i've updated the postfix ocf ra to handle some configuration
> cases and errors:
> 
> 
> https://github.com/raoulbhatia/resource-agents/tree/master/heartbeat/postfix
> 
> 
> depending on your system, you might need to apply the following patch:
> 
> -: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
> -. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
> +: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
> +. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs
> 
> 
> could you please give it a shot and report whats happening?
> 
> if it is still *not* working for you, i would need your current
> configuration, a new hb_report and the system's logfiles.
> 
> thanks,
> raoul
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] mysql m/s failover: 'Could not find first log file name in binary log index file'

2011-04-19 Thread Raoul Bhatia [IPAX]

On 04/19/2011 10:38 AM, Marek Marczykowski wrote:
> On 04/19/11 10:29, Raoul Bhatia [IPAX] wrote:
>> On 04/19/2011 10:20 AM, Marek Marczykowski wrote:
>>> On 04/19/11 10:01, Raoul Bhatia [IPAX] wrote:
>>>> what i can currently think of:
>>>>
>>>> 1. run a cronjob which periodically analyzes the binlogs and will update
>>>> the node's log-file and log-pos attributes if there are empty binlogs;
>>>> (that's the best method, i think)
>>>>
>>>> 2. restore the purged binlogs from a backup, hack the mysql-bin.index
>>>> file to re-include them, and hope that they will not be purged upon
>>>> replication restart?
>>>>
>>>>
>>>> any input on this?
>>>
>>> I've similar problem... I think the first solution is better - after 7
>>> days log-file and log-pos can be cleared to use "FIRST" as start point.
>>
>> in your opintion, is it possible to fix this via the ocf ra or does it
>> have to be a separate cronjob?
> 
> I haven't idea how to do it in ra. There is no easy way to look what
> binlogs are on the other node. Maybe some tricks storing that info on
> monitor action, but this is ugly and makes ra depending on monitor
> action enabled...
> The easiest solutions are the best :)

hi marek,

so i'm trying to modify the cib to do change master to replicate
from mysql-bin.31:0

1. i set standby for wdb01
2. i modify the cib via crm:
> node wdb01 \
> attributes service="wdb" \
> attributes standby="on" wdb02-log-file-wdb-mysql="mysql-bin.31" 
> wdb02-log-pos-wdb-mysql="0"

3. i set online for wdb01

however, this does not work.

upon set_master, the ra checks the current slave information
and finds the (incorrect) mysql-bin.15:24386.

thus, set_master "keeps" this configuration and returns
(mysql ra line 529ff):

> Apr 19 10:46:28 wdb01 mysql-repl[25279]: INFO: Changing MySQL configuration 
> to replicate from wdb02.
> Apr 19 10:46:28 wdb01 mysql-repl[25279]: INFO: Kept master pos for wdb02 : 
> mysql-bin.15:24386
> Apr 19 10:46:28 wdb01 mysql-repl[25279]: INFO: Changing MySQL configuration 
> to replicate from wdb02.
> Apr 19 10:46:28 wdb01 mysql-repl[25279]: INFO: Kept master pos for wdb02 : 
> mysql-bin.15:24386


how would you try to restart the replication?

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] mysql m/s failover: 'Could not find first log file name in binary log index file'

2011-04-19 Thread Raoul Bhatia [IPAX]

On 04/19/2011 10:20 AM, Marek Marczykowski wrote:
> On 04/19/11 10:01, Raoul Bhatia [IPAX] wrote:
>> what i can currently think of:
>>
>> 1. run a cronjob which periodically analyzes the binlogs and will update
>> the node's log-file and log-pos attributes if there are empty binlogs;
>> (that's the best method, i think)
>>
>> 2. restore the purged binlogs from a backup, hack the mysql-bin.index
>> file to re-include them, and hope that they will not be purged upon
>> replication restart?
>>
>>
>> any input on this?
> 
> I've similar problem... I think the first solution is better - after 7
> days log-file and log-pos can be cleared to use "FIRST" as start point.

in your opintion, is it possible to fix this via the ocf ra or does it
have to be a separate cronjob?

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] mysql m/s failover: 'Could not find first log file name in binary log index file'

2011-04-19 Thread Raoul Bhatia [IPAX]

On 04/19/2011 10:01 AM, Raoul Bhatia [IPAX] wrote:
> the failover worked and wdb02 is up and running.
> upon rejoin, wdb01 wanted to start syncing from mysql-bin.15,
> position 24386 (as saved in the cib).
> 
> this fails with error "Last_IO_Errno: 1236" and the message:
>> Last_IO_Error: Got fatal error 1236 from master when reading data
>> from binary log: 'Could not find first log file name in binary log
>> index file'

one additional note:

the ra does not detect this as a failure either. pacemaker reports
that both instances are up and running.

there is a "WARNING: MySQL Slave IO threads currently not running."
in the logs, but the ra checks for "Last_Errno:" only.

thus, it does not catch *any* other error, e.g.
> # mysql -h wdb01c -e "show slave status\G"|grep -i err
>Last_Errno: 0
>Last_Error: 
> Last_IO_Errno: 1236
> Last_IO_Error: Got fatal error 1236 from master when reading 
> data from binary log: 'Could not find first log file name in binary log index 
> file'
>Last_SQL_Errno: 0
>Last_SQL_Error: 

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] mysql m/s failover: 'Could not find first log file name in binary log index file'

2011-04-19 Thread Raoul Bhatia [IPAX]

hi,

i'm starting a new thread to address a specific
"Could not find first log file name in binary log index file" error
upon failover.


background:
i currently have a two node mysql m/s setup.
expire_logs_days (was) set to 7 days
last failover happend > 7 days ago (therefore, binlogs have been purged)


i today tested another failover.
config before failover:
> node wdb01 \
> attributes service="wdb" \
> attributes standby="off" wdb02-log-file-wdb-mysql="mysql-bin.15" 
> wdb02-log-pos-wdb-mysql="24386"
> node wdb02 \
> attributes service="wdb" \
> attributes standby="off"


as stated, some binlogs, including mysql-bin.15, have already been
purged from wdb02. current binlogs:

> -rw-rw 1 mysql adm  149 Apr 11 06:25 mysql-bin.31
> -rw-rw 1 mysql adm  125 Apr 11 10:44 mysql-bin.32
> -rw-rw 1 mysql adm  125 Apr 11 22:50 mysql-bin.33
> -rw-rw 1 mysql adm  149 Apr 12 06:25 mysql-bin.34
> -rw-rw 1 mysql adm  149 Apr 13 06:25 mysql-bin.35
> -rw-rw 1 mysql adm  149 Apr 14 06:25 mysql-bin.36
> -rw-rw 1 mysql adm  125 Apr 14 17:01 mysql-bin.37
> -rw-rw 1 mysql adm  149 Apr 15 06:25 mysql-bin.38
> -rw-rw 1 mysql adm  149 Apr 16 06:25 mysql-bin.39
> -rw-rw 1 mysql adm  149 Apr 17 06:25 mysql-bin.40
> -rw-rw 1 mysql adm  149 Apr 18 06:25 mysql-bin.41
> -rw-rw 1 mysql adm  125 Apr 18 15:45 mysql-bin.42
> -rw-rw 1 mysql adm  149 Apr 19 06:25 mysql-bin.43
> -rw-rw 1 mysql adm  125 Apr 19 09:03 mysql-bin.44
> -rw-rw 1 mysql adm  5366995 Apr 19 09:46 mysql-bin.45
> -rw-rw 1 mysql adm  540 Apr 19 09:04 mysql-bin.index


the failover worked and wdb02 is up and running.
upon rejoin, wdb01 wanted to start syncing from mysql-bin.15,
position 24386 (as saved in the cib).

this fails with error "Last_IO_Errno: 1236" and the message:
> Last_IO_Error: Got fatal error 1236 from master when reading data
> from binary log: 'Could not find first log file name in binary log
> index file'

given the circumstance, that the binlogs have been purged, this is
somewhat expected.

i wonder though, if there is a possibility to automagically trouble-
shoot this issue, as - as you can see from above - all binlogs up to
mysql-bin.45 are empty.


what i can currently think of:

1. run a cronjob which periodically analyzes the binlogs and will update
the node's log-file and log-pos attributes if there are empty binlogs;
(that's the best method, i think)

2. restore the purged binlogs from a backup, hack the mysql-bin.index
file to re-include them, and hope that they will not be purged upon
replication restart?


any input on this?

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Pacemaker / Postfix startup problem...

2011-04-14 Thread Raoul Bhatia [IPAX]


hi adam,

On 14.04.2011 18:10, Adam Reiss wrote:

Hi Raoul,

We're trying to setup a HA SMTP Relay, so having pacemaker stop/start
the services as it passes the work over to the other machine, should
Postfix fail...  Is there a better way to allow an HA SMTP relay?


when we're setting up a clustered postfix, we do not mess with the
default /etc/postfix/ config but use a different location on a drbd
backed deviced instead.

e.g. /data/mail/

this way, local mail deliverey (cron output!) works without any issue -
even if the clustered postfix is down (e.g. for maintenance) or simply
migrated to a different host.


It's running under VMWare, having two different guests, on two different
hosts...

I've attached the output you've requested. :)

There is no syslog file in /var/log .


mhm - your hb_report is incomplete too. i don't know centos - where does
centos' syslog write it's logfiles?

anyways, i've updated the postfix ocf ra to handle some configuration
cases and errors:


https://github.com/raoulbhatia/resource-agents/tree/master/heartbeat/postfix

depending on your system, you might need to apply the following patch:

-: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
-. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
+: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
+. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs


could you please give it a shot and report whats happening?

if it is still *not* working for you, i would need your current
configuration, a new hb_report and the system's logfiles.

thanks,
raoul

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Pacemaker / Postfix startup problem...

2011-04-13 Thread Raoul Bhatia [IPAX]

On 04/13/2011 03:56 PM, Adam Reiss wrote:
> Thank you for responding Raoul!
> 
> I do appreciate any help you could pass along. :)

hi!

i would need the logfiles. please use hb_report to generate a report!
something like:

   hb_report -f $start -t $end /root/hb_report_postfix/

where $start is before and $end is after the problem occurred.

please also do a quick

   grep alternate_config_directories /var/log/syslog

and attach the output.

as a side note: why do you want to manage the main postfix instance
via pacemaker?

you do realize, that - unless you setup a clone, you will not
receive any emails (e.g. output from cronjobs, etc.) from
the secondary node?

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Pacemaker / Postfix startup problem...

2011-04-13 Thread Raoul Bhatia [IPAX]

hi!

i'm the author of the postfix ra and will take a look in the next
days.

please bear with me and nag me, if i'm not replying back to you...

(i'm currently in the middle of some cluster migrations and haven't
got that much time to check the list)

cheers,
raoul

On 04/08/2011 05:32 PM, Adam Reiss wrote:
> Good morning all!
> 
>  
> 
> I was hoping I could ask this group a question, as I’ve run into a snag
> in setting up HA/Pacemaker & Postfix. 
> 
>  
> 
> I’m quite new to setting up HA, so please bear with me. J
> 
>  
> 
> I’ve setup the Heartbeat with the shared IP, and it responds on the network.
> 
>  
> 
> When I try and setup Postfix using the resource agent, Pacemaker seems
> to be having a hard time recognizing that Postfix was started.  I
> confirmed that postfix IS indeed running on restart.  It’s being started
> BY Pacemaker, since I’ve removed the startup from ntsysv.
> 
>  
> 
> I’ve included everything I can think of to try and help.
> 
>  
> 
> Thank you in advance for your assistance!!
> 
>  
> 
>  
> 
>  
> 
> Heartbeat:  3.0.3-2.3.el5
> 
> Pacemaker:  1.0.10-1.4.el5
> 
> Postfix:  2:2.3.3-2.1.el5_2
> 
> CentOS: 5.5 (Final)  Linux AFPMLVOPE2.ql.com 2.6.18-194.32.1.el5 #1
> 
>  
> 
> Active / Passive
> 
> No Quorum
> 
>  
> 
> [root@AFPMLVOPE2 heartbeat]# crm configure show   
>  
> 
> node $id="08090224-3c46-4ef7-829c-9aeea0ed8b49" afpmlvope2.ql.com
> 
> node $id="6b10abd3-e2d5-4e47-8ffd-d75629bb0e1d" afpmlvope4.ql.com
> 
> primitive AFPMLVOPEHA ocf:heartbeat:IPaddr2 \
> 
> params ip="172.16.XXX.XXX" cidr_netmask="32" \
> 
> op monitor interval="30s"
> 
> primitive postfix ocf:heartbeat:postfix \
> 
> params binary="/usr/sbin/postfix" config_dir="/etc/postfix"
> 
> property $id="cib-bootstrap-options" \
> 
> dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
> 
> cluster-infrastructure="Heartbeat" \
> 
> stonith-enabled="false"
> 
>  
> 
>  
> 
> Output from crm_mon
> 
>  
> 
> 
> 
> Last updated: Fri Apr  8 09:53:55 2011
> 
> Stack: Heartbeat
> 
> Current DC: afpmlvope2.ql.com (08090224-3c46-4ef7-829c-9aeea0ed8b49) -
> partition
> 
> with quorum
> 
> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
> 
> 2 Nodes configured, unknown expected votes
> 
> 2 Resources configured.
> 
> 
> 
>  
> 
> Online: [ afpmlvope2.ql.com afpmlvope4.ql.com ]
> 
>  
> 
> AFPMLVOPEHA (ocf::heartbeat:IPaddr2):   Started afpmlvope2.ql.com
> 
>  
> 
> Failed actions:
> 
> postfix_start_0 (node=afpmlvope4.ql.com, call=4, rc=1,
> status=complete): unk
> 
> nown error
> 
> postfix_start_0 (node=afpmlvope2.ql.com, call=4, rc=1,
> status=complete): unk
> 
> nown error
> 
>  
> 
>  
> 
> [root@AFPMLVOPE2 Operations]# /usr/sbin/ocf-tester -n postfix
> /usr/lib/ocf/resource.d/heartbeat/postfix; echo $?
> 
> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/postfix...
> 
> * rc=7: Monitoring an active resource should return 0
> 
> * Your agent does not support the notify action (optional)
> 
> * Your agent does not support the demote action (optional)
> 
> * Your agent does not support the promote action (optional)
> 
> * Your agent does not support master/slave (optional)
> 
> * rc=1: Start failed
> 
> Aborting tests
> 
> 1
> 
>  
> 
> Maillog
> 
>  
> 
> Apr  8 10:08:46 AFPMLVOPE2 postfix/postfix-script: fatal: usage: postfix
> start (or stop, reload, abort, flush, check, set-permissions,
> upgrade-configuration)
> 
> Apr  8 10:08:47 AFPMLVOPE2 last message repeated 2 times
> 
> Apr  8 10:08:47 AFPMLVOPE2 postfix/postfix-script: starting the Postfix
> mail system
> 
> Apr  8 10:08:47 AFPMLVOPE2 postfix/master[3924]: daemon started --
> version 2.3.3, configuration /etc/postfix
> 
> Apr  8 10:08:47 AFPMLVOPE2 postfix/postfix-script: fatal: usage: postfix
> start (or stop, reload, abort, flush, check, set-permissions,
> upgrade-configuration)
> 
> Apr  8 10:08:48 AFPMLVOPE2 last message repeated 3 times
> 
> Apr  8 10:08:48 AFPMLVOPE2 postfix/postfix-script: fatal: the Postfix
> mail system is already running
> 
>  
> 
>  
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15
__

Re: [Pacemaker] large number of pe-input-* files

2011-04-05 Thread Raoul Bhatia [IPAX]

On 04/05/2011 04:35 PM, Shravan Mishra wrote:
> Hi guys,
> 
> penguin process creates a large number of files under /var/lib/pengine.
> 
> We are using HA on a very high per box which is processing large
> amount of data fed fro an external source.
> There is a large number of files creation and IO taking place.
> 
> We ran out of inodes because there were something like 1500 files
> under the mentioned directory:
> 
> 
> 
> ls /var/lib/pengine/ | wc -l
> 1492
> 
> 
> 
> Is there a way to cleanup and or reduce these many files?

pacemaker can do this by itself:

> property $id="cib-bootstrap-options" \
> ...
> pe-error-series-max="100" \
> pe-warn-series-max="100" \
> pe-input-series-max="100" \
> ...

you can read about this in the pacemaker documentation.

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] How to prevent locked I/O using Pacemaker with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)

2011-04-05 Thread Raoul Bhatia [IPAX]

On 04/04/11 22:55, Lars Ellenberg wrote:
>> * > > ro2="Unknown"ds1="UpToDate" ds2="Outdated" />
>  Why keep people using this pseudo xml output?
>  where does that come from? we should un-document this.
>  This is to be consumed by other programs (like the LINBIT DRBD-MC).
>  This is not to be consumed by humans.

when one is used to "crm status", "drbd status" will be typed
automatically without thinking ;)

> # drbdadm status
> 
> 
>  ds1="UpToDate" ds2="UpToDate" />
> 
> 

imho, thats why...

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Corosync or Hearbeat or both ? (Re: Pacemaker Digest, Vol 40, Issue 67)

2011-03-23 Thread Raoul Bhatia [IPAX]

On 03/23/2011 05:00 PM, José Pablo Méndez Soto wrote:
> Thank you Raoul,
> 
> It sounds to me then, as if Heartbeat was actually a remnant of old
> days. Is there something Hearbeat does that Corosync can't? Why would I
> install Heartbeat along Corosync? As you said, they are both "the
> messaging layer".

if you use corosync + pacemaker, there is no need for heartbeat.
(except for some libs - but this should correctly be handled via
dependencies)

e.g. my debian lenny box (with packages from backports):
> # dpkg -l| egrep "corosync|heartbeat|pacemaker|stonith|cluster"
> ii  cluster-agents 1:1.0.3-3~bpo50+1  The reusable cluster 
> components for Linux HA
> ii  cluster-glue   1.0.6-1~bpo50+1The reusable cluster 
> components for Linux HA
> ii  corosync   1.2.1-1~bpo50+1Standards-based cluster 
> framework (daemon an
> ii  libcluster-glue1.0.6-1~bpo50+1The reusable cluster 
> components for Linux HA
> ii  libcorosync4   1.2.1-1~bpo50+1Standards-based cluster 
> framework (libraries
> ii  libheartbeat2  1:3.0.3-2~bpo50+1  Subsystem for 
> High-Availability Linux (libra
> ii  pacemaker  1.0.9.1+hg15626-1~bpo50+1  HA cluster resource manager

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Corosync or Hearbeat or both ?

2011-03-23 Thread Raoul Bhatia [IPAX]


On 23.03.2011 04:42, José Pablo Méndez Soto wrote:

Hi, I am recently starting to poke around with Pacemaker. Can someone
please explain what is the difference between Corosync and Heartbeat,
when would you use one or when the other? And why would you have
corosync installed accoring to this howto, if Heartbeat is to be the
preferred method? (the reason besides being a dependency):


"heartbeat" was used for many different things, among them
* messaging layer (= cluster communication)
* resource agents (= resource management)
* policy engine (= which resource is started where)

since a couple of years, these specific tasks have been split
into different (sub-)projects (e.g. pacemaker), and heartbeat has been
"reduced" to be one messaging layer. corosync (formerly also known as
openais) is another one.

nowadays, most people seem to prefer corosync + pacemaker over
heartbeat "v1" (legacy) or heartbeat + pacemaker.

cheers,
raoul

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Filesystem resource agent patch

2011-03-18 Thread Raoul Bhatia [IPAX]

On 03/18/2011 11:35 AM, Marko Potocnik wrote:
> If you use symbolic links in Filesystem resource agent directory
> parameter, then monitoring operation fails, because actual mount point
> in /proc/mounts (or the output of mount command) is diferent as the
> configured one.
> 
> Here is the patch that fixes this:
> 
> --- Filesystem_new_org  2011-03-18 11:32:37.0 +0100
> +++ Filesystem_new  2011-03-18 12:27:35.0 +0100
> @@ -1002,0 +1003,6 @@
> +
> +   #Resolve symlinks in MOUNTPOINT
> +   resolved_mntpnt=`readlink -f $MOUNTPOINT`
> +   if [ $? -eq 0 ]; then
> +   MOUNTPOINT=$resolved_mntpnt
> +   fi

looks fine to me. though dejan will most probably request that you
resend the patch as an attachment ;)

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] designing a load balancer - request for comments

2011-02-16 Thread Raoul Bhatia [IPAX]

On 02/14/2011 03:14 PM, Klaus Darilion wrote:
> But if the LSB script is conform, will I get better results? I will
> replacing the lsb with an ocf resource when the basic failover happens
> as expected :-)

you will be able to properly use more advanced features of pacemaker,
including: clones [1] with notifications [2], different monitoring
depths/levels [3], the ability to run several instances of your resource
with different configurations at the same time, etc.

cheers,
raoul


[1]
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-resource-clone.html
[2]
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch10s02s03s06.html
[3]
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-operation-monitor-multiple.html
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] designing a load balancer - request for comments

2011-02-14 Thread Raoul Bhatia [IPAX]

On 02/14/2011 02:37 PM, Klaus Darilion wrote:
> Somehow pacemaker does not react as I would expect it. My config is:
> 
> primitive failover-ip ocf:heartbeat:IPaddr \
> params ip="83.136.32.161" \
> op monitor interval="3s"
> primitive kamailio lsb:kamailio \
> meta migration-threshold="2" failure-timeout="60" \
> op monitor interval="15" timeout="15"
> clone cloneKamailio kamailio
> colocation colo_ip_with_kamailio inf: failover-ip cloneKamailio
> property $id="cib-bootstrap-options" \
> dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="5"
...
> So, what am I doing wrong? I would expect that after 60s the
> failure-count is resetted.

there is no "cluster-recheck-interval" in your properties:

property $id="cib-bootstrap-options" \
dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
stonith-enabled="true" \
cluster-infrastructure="openais" \
...
cluster-recheck-interval="1min"

try to set this and redo your testing.

cheers,
raoul

ps. i'd very much love to see a ocf compatible ra instead of the lsb
script ;)
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] designing a load balancer - request for comments

2011-02-11 Thread Raoul Bhatia [IPAX]

hi,

On 02/11/2011 05:55 PM, Klaus Darilion wrote:
>> if the clone'd resource properly synchronizes by itself, you only need
>> to migrate the (public) service ip address. a colocation rule will
>> achieve that.
> 
> fine
> 
>> basic clone setup:
>> primitive extip ocf:heartbeat:IPaddr2 \
>> params ip="a.b.c.d" nic="eth0.123" cidr_netmask="28" \
>> op monitor interval="30"
>> primitive kamailio ocf:pernau:kamailio \
>> op monitor interval="30" timeout="30" \
>> op monitor interval="15" role="Slave" timeout="30" \
>> op monitor interval="10" role="Master" timeout="30"
>> clone clone-kamailio kamailio
>> colocation colo_extip_with_kamailio inf: extip ms-kamailio
> 
> If I understand it right, this means that extip (the virtual-IP
> resource) must run on a node which also runs ms-kamailio. Shouldn't that
> be 'kamailio' instead of 'ms-kamailio' or is there some implicit
> resource naming happening?

first of all, i made an error when modifying the ms example for
clone. above, it should read "clone-kamailio".

you can only reference the clone or the ms resource and should never,
ever reference one single instance.

what the clones (and the ms) do (actually not exactly true but enough
to illustrate what i want to say):

1. if you clone a resource, like above, the cluster (internally) creats:
kamailio:0, kamailio:1, kamailio:n-1 (n=clone-max) [1]

if you colcate it with clone-kamailio, the cluster knows that you want
to run extip on the same host as a "clone of kamailio" aka kamailio:0
or kamailio:1 is started.

> Additionally I want the virtual-IP to be moved if upstream-connectivity
> of a node is broken. Therefore I used ping as below:
> 
> primitive extip ocf:heartbeat:IPaddr \
> params ip="11.222.32.161" \
> op monitor interval="3s"
> primitive pingtest ocf:pacemaker:ping \
> params host_list="11.222.53.113" multiplier="10" dampen="5s" \
> op monitor interval="10s"
> clone clonePing pingtest
> location aktiverLB extip \
> rule $id="aktiverLB-rule" -inf: not_defined pingd or pingd lte 0

sorry, never successfully used ping so i cannot quote on its
functionality. ;)

> So, now I am not sure anymore if I should use "location" or "colocation"
> as constraint to have the virtual-IP only on a node which has
> connectivity. I got the impression that colocation is for binary
> decisions (Kamailio is either running or not) and location for resources
> which may have non-binary score values (eg. ping score can be 0-x if I
> ping x hosts).

location: add more/less weight for a host in regard to running a
resource.
think: "on which servers aka #uname am i allowed to run".

colocation: defines which resources shall/have to or should/must not
run on the very same host.
think: db must not run on the same host as webserver. extip webserver
has to run on the same host as webserver.

> So, will it be correct to just add
> 
> primitive pingtest ocf:pacemaker:ping \
> params host_list="11.222.53.113" multiplier="10" dampen="5s" \
> op monitor interval="10s"
> clone clonePing pingtest
> location aktiverLB extip \
> rule $id="aktiverLB-rule" -inf: not_defined pingd or pingd lte 0
> 
> to your configuration or do I have to change this constraint to a
> "colocation" constraint?

as said, i have never actually used the ping resource.

but i suggest you add a strong colocation rule between extip and
kamailio (and quite possibly extip and ping or kamailio and ping),
because it might happen that one node isn't able to start kamilio
(e.g. config error during upgrade) so you do not want extip to
"accidentally" run on this host.

so once more: read the documentation and some of the tutorials
in the wiki and then try and play around to get a feeling for
pacemaker.

btw, handy tools are: drbd-mc
showscores.sh to fetch a resource's score -> higher score for a node
-> more likely to run the resource there.
(available from the hg repository) and
ptest -G/ptest -D to create a diagram of what the cluster is going to do

cheers,
raoul

[1]
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch10s02s02.html
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.or

Re: [Pacemaker] designing a load balancer - request for comments

2011-02-11 Thread Raoul Bhatia [IPAX]

hi,

On 02/11/2011 03:07 PM, Klaus Darilion wrote:
...
> If everything is fine, then Kamailio should run on both servers.
> 
> But for example how should the cluster behave if e.g. Kamailio crashes
> and restarting by pacemaker again leads to crashes (e.g. Kamailio has DB
> connectivity problems or other problems).

in this case, there should be no need for a multistate resource.
cloning the kamailio service should be fine.

> Is there some protection in pacemaker to not endlessly trying to restart
> such a broken service?
>
> Or, how should pacemaker behave if Kamailio on the active node crashes.
> Shall it just restart Kamailio or shall it migrate the IP address to the
> other node and then try to restart Kamailio on the inactive node?

pacemaker will not endlessly try to restart the configured resources:
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html

pacemaker can be configured to restart a resource e.g. for a couple of
times and if this does not work, it will migrate to another host.
(you can also configure pacemaker to migrate upon the first failure)

please read the documentation (link above)

if the clone'd resource properly synchronizes by itself, you only need
to migrate the (public) service ip address. a colocation rule will
achieve that.

basic clone setup:
primitive extip ocf:heartbeat:IPaddr2 \
params ip="a.b.c.d" nic="eth0.123" cidr_netmask="28" \
op monitor interval="30"
primitive kamailio ocf:pernau:kamailio \
op monitor interval="30" timeout="30" \
op monitor interval="15" role="Slave" timeout="30" \
op monitor interval="10" role="Master" timeout="30"
clone clone-kamailio kamailio
colocation colo_extip_with_kamailio inf: extip ms-kamailio

cheers,
raoul
ps. in the last ms configuration, i accidentally wrote wdb-mysql
instead of kamailio - i simply copied a running m/s config and adapted
it for your needs ;)
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] designing a load balancer - request for comments

2011-02-11 Thread Raoul Bhatia [IPAX]

hi,

On 02/09/2011 03:04 PM, Klaus Darilion wrote:
...
> 
> 
> server1   server2
>  ip1ip2
> <-virtual-IP--->
> 
...
> Kamailio should always be running on both servers. Replication of
> Registrations is done on SIP level between the to servers (using the
> 'real' IP). The Registrar/LoadBalancer service itself is provided on the
> virtual-IP, thus the SIP traffic is handled either on server1 or server2.
> 
> If the Kamailio process dies, it should be restarted. If it fails to
> start or fails to often (flapping) the virtual-IP should be migrated to
> the other node. Of course failover should also happen if one server dies
> completely.

best would be to write a kamailio resource agent.

> I wonder how I should implement this. Of course the virtual-IP sould onl
> be active on a host where Kamailio is active too.
>
> 
> How do I make Kamailio active on both nodes? Just a "clone" as with the
> "ping resource"?
> 
> How do I try to recover a Kamailio problem (restart) a few times before
> migrating the virtual-IP to the other node?

i thought that kamilo is active on both nodes? or what is the difference
between "running on both servers" and "a host where kamailio is active"?

(if there is a difference, you should create a multistate aka
master/slave resource)


basically:
primitive extip ocf:heartbeat:IPaddr2 \
params ip="a.b.c.d" nic="eth0.123" cidr_netmask="28" \
op monitor interval="30"
primitive kamailio ocf:pernau:kamailio \
op monitor interval="30" timeout="30" \
op monitor interval="15" role="Slave" timeout="30" \
op monitor interval="10" role="Master" timeout="30"
ms ms-kamailio wdb-mysql
colocation colo_extip_with_kamailio inf: extip ms-kamailio:Master


the most important thing would be the kamailio resource agent.


cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Failover configuration with apache

2011-02-09 Thread Raoul Bhatia [IPAX]


hi,

On 09.02.2011 08:43, u.schmel...@online.de wrote:

I don't use ocf:heartbeat:apache because of my special configuration:

apache is active on both nodes, the access is via haproxy, we have an
url http://localhost:80 which accesses the local apache via ssl and
delivers a static page (returning a fix text).


ok - i do not fully understand what you're exactly doing - but
as ocf:heartbeat:apache is highly configurable (binary, port, config
file, testurl, test regex, envfiles, statusurl, etc.),
it should be possible to use it.

if not, we should probably adapt the ra ;)


So if we switch over, the stop action does an apache_force_reload and
sets the current node to standby. The new active node performs an
apache_start (to be sure apache is running) and a force_reload to

establish the virtual ips. /etc/init.d/apache2 without any parameter
gives no output. The output of file -k /etc/init.d/apache2 gives
/etc/init.d/apache2: ASCII English text


it seems like /etc/init.d/apache2 is not an lsb compatibly init script.
can you post the script to a pastebin service and/or verify if it
is actually doing what you expect it to do (read: check against [1])

cheers,
raoul

[1] 
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-lsb.html


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] problem with apache coming up

2011-02-08 Thread Raoul Bhatia [IPAX]


On 08.02.2011 17:05, Testuser SST wrote:

Hi Raoul,

there is no entry in any httpd-logfile at the specific time. And the httpd.conf 
exists in the right path.

[root@astinos httpd]# locate httpd.conf
/etc/httpd/conf/httpd.conf


1. please post your config
2. what happens when you start httpd via something like:
 /usr/sbin/httpd -DSTATUS -k start -f /etchttpd/conf/httpd.conf

3. what http/apache server version do you actually use?
4. is the apache config identical on both hosts?

cheers,
raoul


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Failover configuration with apache

2011-02-08 Thread Raoul Bhatia [IPAX]

On 02/08/2011 02:37 PM, u.schmel...@online.de wrote:
> 
> Hi list,
> 
> have some problems to setup the above configuration. My environment is as 
> follows:
> 
> rhel6 , pacemaker-1.1.2-7.el6.x86_64, apache2, haproxy running on both nodes, 
> service addresses are managed by pacemaker
> there is an additional resource script, that takes care of 
> reloading/restarting apache in case of failover. This script doesn't work ass 
> expected.
> The lsb checks are passed by the script, however when running in lrmd context 
> I get:
> 
> [9555]: info: rsc:apacheIP:13: monitor
> lrmd: [9555]: info: rsc:haproxyIP:12: monitor
> lrmd: [9555]: notice: lrmd_rsc_new(): No lrm_rprovider field in message
> lrmd: [9555]: info: rsc:webservice:21: probe
> lrmd: [7177]: ERROR: (raexeclsb.c:execra:267) execv failed for 
> /etc/init.d/apache2: Exec format error
> lrmd: [9555]: info: rsc:webservice:22: stop
> lrmd: [7178]: WARN: For LSB init script, no additional parameters are needed.
> lrmd: [7178]: ERROR: (raexeclsb.c:execra:267) execv failed for 
> /etc/init.d/apache2: Exec format error

what happens if you execute  /etc/init.d/apache2 without any parameter?
what dies "file -k /etc/init.d/apache2" show?

> primitive webservice lsb:apache2 \
>  op start interval="0"

why do you use lsb instead of the ocf:heartbeat:apache?

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] problem with apache coming up

2011-02-08 Thread Raoul Bhatia [IPAX]

On 02/08/2011 04:13 PM, Testuser SST wrote:
> Hi,
> 
> I´m implementing a two node webserver on CentOS 5 with heartbeat/pacemaker 
> and DRBD. The first new installed node works fine, but when the second node 
> becomes active, there seems to be a problem with the apache starting up. Is 
> there a way to get some more information about the problem. I can start up 
> the apache with a "service httpd start" without any failure.
> I´m using the packages supplied by the cluster labs repo.
>  
> Here comes a part of the logfile:
> 
> Feb 08 15:35:06 astinos crmd: [7660]: info: do_lrm_rsc_op: Performing 
> key=8:742:0:7d755c4b-fa07-4a1b-87f0-d78894638e98 op=Apache_start_0 )
> Feb 08 15:35:06 astinos lrmd: [7657]: debug: on_msg_perform_op:2359: copying 
> parameters for rsc Apache
> Feb 08 15:35:06 astinos lrmd: [7657]: debug: on_msg_perform_op: add an 
> operation operation start[19] on ocf::apache::Apache for client 7660, its 
> parameters: CRM_meta_timeout=[2] crm_feature_set=[3.0.1] 
> configfile=[/etc/httpd/conf/httpd.conf]  to the operation list.
> Feb 08 15:35:06 astinos lrmd: [7657]: info: rsc:Apache:19: start
> apache[8406]:   2011/02/08_15:35:07 INFO: apache not running
> apache[8406]:   2011/02/08_15:35:07 INFO: waiting for apache 
> /etc/httpd/conf/httpd.conf to come up
> Feb 08 15:35:08 astinos lrmd: [7657]: WARN: Managed Apache:start process 8406 
> exited with return code 1.
> Feb 08 15:35:08 astinos crmd: [7660]: info: process_lrm_event: LRM operation 
> Apache_start_0 (call=19, rc=1, cib-update=25, confirmed=true) unknown error

does /etc/httpd/conf/httpd.conf exist? what does the apache error.log
(e.g. /var/log/apache2/error.log) say?

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] display (infrastructure) stack version

2011-01-26 Thread Raoul Bhatia [IPAX]

hi,

is it possible to show the (infrastructure) stack version in
crm_mon/crm/... (like the dc version)?

it currently only shows openais, but nothing more:

> # crm status
> 
> Last updated: Wed Jan 26 10:39:19 2011
> Stack: openais
> Current DC: wdb01 - partition with quorum
> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> 
> 
> Online: [ wdb01 wdb02 ]

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Upgrade from openais-0.80 failed

2011-01-26 Thread Raoul Bhatia [IPAX]

On 01/26/2011 10:06 AM, Dan Frincu wrote:
> Hi,
> 
> I've got a pair of servers running on RHEL5 x86_64 with openais-0.80
> (older install) which I want to upgrade to corosync-1.3.0 +
> pacemaker-1.0.10. Downtime is not an issue and corosync 1.3.0 is needed
> for UDPU, so I built it from the corosync.org 
> website and openais 1.1.4 from openais.org  website.

hi,

you do not need both corosync and openais as corosync supersedes
openais: http://corosync.org/doku.php?id=faq:why

on my debian squeeze based system, i see corosync only:

> # dpkg -l|grep -i coro
> ii  corosync   1.2.1-4
> ii  libcorosync4   1.2.1-4
> root@wdb01 ~ #

vs.
> root@wdb01 ~ # dpkg -l|grep -i ais
> root@wdb01 ~ # 


> Logs: http://pastebin.com/i0maZM4p

your logfiles tell that pacemaker 1.0.9 is started (line 55):
> Jan 25 11:19:39 corosync [SERV  ] Service engine loaded: Pacemaker Cluster 
> Manager 1.0.9

on the other hand, line 59 says:
> Jan 25 11:19:39 cluster1 crmd: [9722]: info: main: CRM Hg Version: 
> da7075976b5ff0bee71074385f8fd02f296ec8a3

which should be 1.0.10 (/me is puzzled)


can you purge all related packages once more and verify, that
all binaries are gone?

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] setting default standby for (new) nodes

2010-12-11 Thread Raoul Bhatia [IPAX]

hi,

is it possible to set the standby value to "on" for new nodes
joining a cluster?

for what i currently know, the following scenarios might occur:

1. standby: on  -> node is in standby

2. standby: off -> node is online

3. no value for standby
(no )

-> node is online.


did i overlook something is is there - as of now, now option
to force the 3rd case (no value for standby) to default to on?

(such a feature/property is *not* important to me - i was just checking
some default values when this thought occured)


thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] 1.0.10, released?

2010-11-10 Thread Raoul Bhatia [IPAX]

hi,

On 11/10/2010 11:02 AM, Pavlos Parissis wrote:
> Hi,
> 
> Although it has been mentioned in other threads that 1.0.10 is out I
> don't see any RPMs in http://clusterlabs.org/rpm/epel-5/

where did you hear that from? afairc, 1.0.10 is still under development.

> I don't see any tag for 1.0.10 in mercucial
> (http://hg.clusterlabs.org/pacemaker/1.0/) , but I don't see a tag for
> 1.0.9 either.
> Is it actually released or I have misunderstood?

the correct path is http://hg.clusterlabs.org/pacemaker/stable-1.0/

you can spot this via
http://hg.clusterlabs.org/pacemaker/?sort=lastchange

afairc, http://hg.clusterlabs.org/pacemaker/1.0/ aka "Pacemaker 1.0
(clean)" was meant for http://www.ohloh.net/p/pacemaker or the like.

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] apache could not start

2010-10-08 Thread Raoul Bhatia [IPAX]


On 09.10.2010 04:09, jiaju liu wrote:

Hi everybody
I use command
*crm configure primitive apache ocf:heartbeat:apache params
"/opt/lampp/etc/httpd.conf" meta resource-stickiness=1
target-role=stopped op start timeout=120s op stop timeout=120s op
monitor timeout=20s interval=10s op status timeout=30s;*
and
*crm resource start apache*
to start a apache resource, however when I use crm resource show it appears
*apache (ocf::heartbeat:apache) Stopped *
**
*what's wrong with it?*


what are the logfiles saying? what is "crm status -r" saying?
can you please submit an "hb_report"?

thanks

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Can somebody please explain pengine's urge to move all resources?

2010-10-07 Thread Raoul Bhatia [IPAX]

On 10/06/2010 11:16 AM, Keisuke MORI wrote:
> This should have been fix with this:
> http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/5fe02f48c47b
> 
> The patch has been already backported to the 1.0 repository and will
> be included in 1.0.10.
> Will you test with the tip of 1.0 repository if you have any chance?

hi mori-san,

thanks for the information. unfortunately, i currently have no time
to set up a corresponding compile environment.

but is there any chance that some kind of a nightly build can be
provided by the opensuse buildservice for debian lenny?
(this is the server with the mentioned configuration)

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] Backports from 1.1 to 1.0

2010-10-07 Thread Raoul Bhatia [IPAX]

hi all,

do you have any further information, eta, repository, etc. in
regard of the backported patches from 1.1 to 1.0?

i would be very interested in tracking them :)

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Can somebody please explain pengine's urge to move all resources?

2010-09-28 Thread Raoul Bhatia [IPAX]

On 09/23/2010 09:28 AM, Andrew Beekhof wrote:
> The good news is that 1.1.3 doesn't have that behavior.
> Lets see how 1.0 goes once all the relevant patches have been backported.

thanks for your answer! will those patches make it into 1.0.10 or
do you have another eta for this?

this issue has caused trouble for a very long time but i finally
took some time to track it down.

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] cib

2010-09-25 Thread Raoul Bhatia [IPAX]


On 24.09.2010 21:41, Shravan Mishra wrote:

crmd[20612]: 2010/09/24_15:29:57 ERROR: crm_log_init_worker: Cannot
change active directory to /var/lib/heartbeat/cores/hacluster:
Permission denied (13)


ls -ald /var/lib/heartbeat/cores/hacluster /var/lib/heartbeat/cores/ 
/var/lib/heartbeat/ /var/lib/ /var/


is haclient allowed to cd all the way into 
/var/lib/heartbeat/cores/hacluster ?


cheers,

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Timeout after nodejoin

2010-09-22 Thread Raoul Bhatia [IPAX]

hi,

On 09/22/2010 02:43 PM, Dan Frincu wrote:
> When I start openais, I get nodejoin immediately, as seen in the logs
> below. However, it takes some time before the nodes are visible in
> crm_mon output. Any idea how to minimize this delay?
> 
> Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info:
> send_member_notification: Sending membership update 8 to 1 children
> Sep 22 15:27:24 bench1 openais[12935]: [CLM  ] got nodejoin message
> 192.168.165.33
> Sep 22 15:27:24 bench1 openais[12935]: [CLM  ] got nodejoin message
> 192.168.165.35
> Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started.
> Sep 22 15:27:24 bench1 openais[12935]: [crm  ] WARN: route_ais_message:
> Sending message to local.crmd failed: unknown (rc=-2)
> Sep 22 15:27:24 bench1 openais[12935]: [crm  ] WARN: route_ais_message:
> Sending message to local.crmd failed: unknown (rc=-2)
> Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: pcmk_ipc: Recorded
> connection 0x174840d0 for crmd/12946
> Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: pcmk_ipc: Sending
> membership update 8 to crmd
> Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info:
> update_expected_votes: Expected quorum votes 1024 -> 2
> Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership
> 8: quorum aquired
> Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote:
> Election 2 (owner: bench2) pass: vote from bench2 (Host name)
> Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
> transition S_PENDING -> S_ELECTION [ input=I_ELECTION
> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
> Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering
> TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb
> Sep 22 15:28:15 bench1 crmd: [12946]: WARN:
> cib_client_add_notify_callback: Callback already present
> Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting
> custom graph functions
> Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked
> transition -1: 0 actions in 0 synapses
> Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over
> DC status for this partition
> Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are
> now in R/W
> mode  
>  

is the cluster up and running and you're only (re-)starting one node?
or is this after you start openais on both nodes.

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Problems Installing Pacemaker and Heartbeat

2010-09-22 Thread Raoul Bhatia [IPAX]

hi,

On 09/22/2010 02:52 PM, Chen Stormstout wrote:
> Hi,
> 
> Thanks for your post Raoul, but read this tutorial is what i made at first 
> place.

ok.

> # For the cluster
> deb http://backports.debian.org/debian-backports lenny-backports main contrib 
> non-free
> deb http://www.backports.org/debian lenny-backports main contrib non-free
> deb http://people.debian.org/~madkiss/ha lenny main

madkiss' repository shouldn't be necessary anymore. please comment it
out for now

> But like the How to says:
> 
> "So in order to use Pacemaker on Debian GNU/Linux 5.0 ("Lenny"), please add 
> the Backports.org-repository to your APT-configuration according to the 
> How-To on this site. This has to be done on all nodes in your cluster."
> 
> I've followed the tutorial:
> 
> aptitude -t lenny-backports install heartbeat pacemaker

do you *need* to use heartbeat? otherwise, i would suggest corosync
as - in my experience - it is much faster than heartbeat (e.g. startup).

please retry after commenting out the above.

if it is still not working, please provide the output of:

  apt-cache policy cluster-glue pacemaker heartbeat
  apt-cache policy cluster-agents libcluster-glue libcorosync4
  apt-cache policy libheartbeat2 libnet1 libopenipmi0 libtimedate-perl

(if you switched to corosync, please let us know if there is any issue
when you try to install these packages by using

  aptitude install -t lenny-backports pacemaker corosync


thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Problems Installing Pacemaker and Heartbeat

2010-09-22 Thread Raoul Bhatia [IPAX]

hi,

please refer to http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] correct permissions for /var/lib/pengine

2010-09-22 Thread Raoul Bhatia [IPAX]

in my recent hb_report, i find:
> WARN: problem with permissions/ownership at wc01:
> wrong permissions or ownership for /var/lib/pengine:
> drwxr-xr-x 2 hacluster haclient 5038080 Jul 23 08:58 /var/lib/pengine
> WARN: problem with permissions/ownership at wc02:
> wrong permissions or ownership for /var/lib/pengine:
> drwxr-xr-x 2 hacluster haclient 4096 Sep 22 11:26 /var/lib/pengine


what should the correct permission be?

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] Release Matrix

2010-09-22 Thread Raoul Bhatia [IPAX]

hi,

regarding the Release Matrix [1] and the ABI-change in cluster-glue/
clplumbing [2], i wonder if pacemaker 1.0.9.1 really works with
glue 1.0.3?

cheers,
raoul

[1] http://www.clusterlabs.org/wiki/ReleaseMatrix
[2] http://www.gossamer-threads.com/lists/linuxha/pacemaker/65443
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Connection to our AIS plugin (9) failed: Library error

2010-09-21 Thread Raoul Bhatia [IPAX]

from my experience with similar issues:

have you upgraded your cluster-glue packages too?

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] how to keep ftp connection when swap from primary to secondary

2010-08-26 Thread Raoul Bhatia [IPAX]

On 08/26/2010 04:42 PM, liang...@asc-csa.gc.ca wrote:
> I have followed the guide in “Clusters from Scratch” written by Andrew
> Beekhof and successfully setup an Active/Passive pair of cluster
> servers. The cluster runs in Fedora 13 and includes services like
> apache, vsftpd and nfs. Drbd is used to allow data consistence during a
> failover. Everything works fine except ftp lose its connection when the
> service swaps from primary to the secondary or vice versa. I know to
> keep the ftp connection, one may need to keep the connection states for
> the session across the nodes. But I couldn’t find clue how to do it.
> Does anyone there have any idea how to keep the ftp connection when
> swapping nodes, if it is possible?

hi,

as of now, we're not syncing our connections between the load
balancers, but i would suggest
http://www.linuxvirtualserver.org/docs/sync.html and the like.


cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] /.crm_help_index file (in system root aka /)

2010-07-14 Thread Raoul Bhatia [IPAX]

On 07/13/2010 09:47 PM, Maros Timko wrote:
> The python crm scripts use os.getenv("HOME") to decide where to look
> for or store the history file. Some of the environments (cronjob or
> sudo) do have HOME set to "/".
> Try to prepend crm call with:
> export HOME=/root

ok, i think i found the reason:

we're monitoring our servers using the nagios nrpe server.

nagios-nrpe-server.preinst on debian lenny adds the nagios user via:
> adduser --system --group --no-create-home --home /var/log/nagios --quiet 
> nagios

but this directory does not exist:

> # ls -ald /var/log/nagios
> ls: cannot access /var/log/nagios: No such file or directory
> # su - nagios
> No directory, logging in with HOME=/

we then use "sudo crm ..." to monitor the cluster and it's node, so
crm will re-build the index in $HOME which is /

changing nagios' homedir or creating /var/log/nagios fixes this issue.
.crm_help_index is then created inside this user's $HOME.

thanks for your advice and the valuable input.

would it be reasonable to use /tmp or /var/tmp in case that $HOME
resolves to / or in case that $HOME isn't writable by this user?
(or not create the .crm_help_index at all)

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] /.crm_help_index file (in system root aka /)

2010-07-13 Thread Raoul Bhatia [IPAX]

hi,

On 07/12/2010 05:36 PM, Lars Ellenberg wrote:
> put some "( date ; env) >> /tmp/tmp.debug.log" into your script,
> then we can determine if the crm shell assumes too much.

i did some digging around and found that cron seems to set
the correct environment. so i used "inotifywait -m /"
to watch for changes to /:

# tail -f /var/log/syslog &
# inotifywait -m / | grep --color crm_ &

please find a c/p from three of such events attached.

it now seems to me that the drbd monitor action creates these files.

however, i'm unable to reproduce this behavior if i manually trigger
a monitor action via "crm resource reprobe drbd_data:0"

my drbd resource configuration:
> primitive drbd_data ocf:linbit:drbd \
> op monitor interval="15" role="Started" timeout="10" \
> op monitor interval="10" role="Slave" timeout="10" \
> op monitor interval="5" role="Master" timeout="20" \
> op stop interval="0" timeout="100" \
> op start interval="0" timeout="240" \
> params drbd_resource="r0"


> ms ms_drbd_data drbd_data \
> meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" globally-unique="false" notify="yes" interleave="true" 
> target-role="Started" migration-threshold="2" failure-timeout="1min"

drbd version:

> # cat /proc/drbd 
> version: 8.3.8 (api:88/proto:86-94)
> GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by r...@localhost, 
> 2010-07-07 11:05:14

any ideas?

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15

r...@c01n01 /var/log # Jul 13 12:05:05 c01n01 lrmd: [13212]: debug: 
rsc:mysql-proxy-tcp:39: monitor
Jul 13 12:05:05 c01n01 lrmd: [6010]: debug: perform_ra_op: resetting scheduler 
class to SCHED_OTHER
Jul 13 12:05:05 c01n01 lrmd: [13212]: debug: rsc:drbd_data:0:80: monitor
Jul 13 12:05:05 c01n01 lrmd: [6017]: debug: perform_ra_op: resetting scheduler 
class to SCHED_OTHER
Jul 13 12:05:05 c01n01 drbd[6017]: DEBUG: r0: Calling /usr/sbin/crm_master -Q 
-l reboot -v 1
Jul 13 12:05:05 c01n01 attrd: [13213]: debug: attrd_local_callback: update 
message from crm_attribute: master-drbd_data:0=1
Jul 13 12:05:05 c01n01 attrd: [13213]: debug: attrd_local_callback: Supplied: 
1, Current: 1, Stored: 1
Jul 13 12:05:05 c01n01 drbd[6017]: DEBUG: r0: Exit code 0
Jul 13 12:05:05 c01n01 drbd[6017]: DEBUG: r0: Command output: 
Jul 13 12:05:05 c01n01 lrmd: [13212]: debug: RA output: 
(drbd_data:0:monitor:stdout) 

r...@c01n01 /var/log # Jul 13 12:05:06 c01n01 lrmd: [13212]: debug: 
rsc:pure-ftpd:42: monitor
Jul 13 12:05:06 c01n01 lrmd: [6058]: debug: perform_ra_op: resetting scheduler 
class to SCHED_OTHER
Jul 13 12:05:10 c01n01 lrmd: [13212]: debug: rsc:drbd_data:0:80: monitor
Jul 13 12:05:10 c01n01 lrmd: [6076]: debug: perform_ra_op: resetting scheduler 
class to SCHED_OTHER
Jul 13 12:05:10 c01n01 drbd[6076]: DEBUG: r0: Calling /usr/sbin/crm_master -Q 
-l reboot -v 1
Jul 13 12:05:10 c01n01 attrd: [13213]: debug: attrd_local_callback: update 
message from crm_attribute: master-drbd_data:0=1
Jul 13 12:05:10 c01n01 attrd: [13213]: debug: attrd_local_callback: Supplied: 
1, Current: 1, Stored: 1
Jul 13 12:05:10 c01n01 drbd[6076]: DEBUG: r0: Exit code 0
Jul 13 12:05:10 c01n01 drbd[6076]: DEBUG: r0: Command output: 
Jul 13 12:05:10 c01n01 lrmd: [13212]: debug: RA output: 
(drbd_data:0:monitor:stdout) 
/ CREATE .crm_help_index
/ OPEN .crm_help_index
/ MODIFY .crm_help_index
/ CLOSE_WRITE,CLOSE .crm_help_index
/ OPEN .crm_help_index
/ ACCESS .crm_help_index
/ CLOSE_NOWRITE,CLOSE .crm_help_index
Jul 13 12:05:12 c01n01 cibadmin: [6151]: info: Invoked: cibadmin -Ql 
Jul 13 12:05:14 c01n01 lrmd: [13212]: debug: rsc:mysql-proxy-socket:41: monitor
Jul 13 12:05:14 c01n01 lrmd: [6175]: debug: perform_ra_op: resetting scheduler 
class to SCHED_OTHER

r...@c01n01 /var/log # 
r...@c01n01 /var/log # 
r...@c01n01 /var/log # Jul 13 12:05:15 c01n01 lrmd: [13212]: debug: 
rsc:mysql-proxy-tcp:39: monitor
Jul 13 12:05:15 c01n01 lrmd: [6179]: debug: perform_ra_op: resetting scheduler 
class to SCHED_OTHER
Jul 13 12:05:15 c01n01 lrmd: [13212]: debug: rsc:drbd_data:0:80: monitor
Jul 13 12:05:15 c01n01 lrmd: [6183]: debug: perform_ra_op: resetting scheduler 
class to SCHED_OTHER
Jul 13 12:05:15 c01n01 drbd[6183]: DEBUG: r0: Calling /usr/sbin/crm_master -Q 
-l reboot -v 1
Jul 13 12:05:15 c01n01 attrd: [13213]: debug: attrd_local_callback: update 
message from crm_attribute: master-drbd_data:0=1
Jul 13 12:05:15 c01n01 attrd: [13213]: debug: attrd_local_callba

Re: [Pacemaker] /.crm_help_index file (in system root aka /)

2010-07-12 Thread Raoul Bhatia [IPAX]

On 07/02/2010 03:26 PM, Dejan Muhamedagic wrote:
> On Thu, Jul 01, 2010 at 07:46:44PM +0200, Raoul Bhatia [IPAX] wrote:
>> On 07/01/2010 05:46 PM, Dejan Muhamedagic wrote:
>>> The help index is created in the user's home. Is that the home of
>>> the root user? Shouldn't it be /root? BTW, there are many other
>>> programs creating dot files in home dirs.
>>
>> yeah, i know. and /root/. would perfectly be fine.
>>
>> but occasionally, it appears in /
>>
>> thats the "issue".
> 
> OK, it's because the HOME variable expands to zilch. Now, that
> may be a problem because we definitely need a place for the help
> index. In case there's no HOME variable the shell may refuse to
> create the index which means there wouldn't be any help, but this
> has to be tested. You can open a bugzilla if bothered.

hi,

thinking about it, it *might* be related to cronjobs:

> # cat /etc/cron.d/ipax_backup 
> PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
> 
> # m h dom mon dow user  command
> 00 07* * *   root   /root/bin/do_i_run_res.sh mysql-server && 
> /root/bin/backup.mysql.sh > /dev/null

can you confirm my suspicion? shall i still open a bugzilla?

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Upgraded mysql from 5.0 to 5.1 - And changed to OCF RA

2010-07-07 Thread Raoul Bhatia [IPAX]

On 07/07/2010 05:55 PM, Jake Bogie wrote:
> So I took Dan's advice this time and cleaned up my resource
> configuration, updated the script, and verified...however I'm still not
> getting the resource online...

hi,

please use hb_report to gather the logfiles.

> # cat /var/log/messages | grep mysql-server

is insufficient.

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [PATCH] suggested bashism fixes for hb2openais.sh

2010-07-07 Thread Raoul Bhatia [IPAX]

On 07/07/2010 11:37 AM, Dejan Muhamedagic wrote:
> Yes, this is just an auxiliary script, I'll make it a bash
> script.

why not apply the rather trivial changes?

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [PATCH] suggested bashism fixes for hb2openais.sh

2010-07-07 Thread Raoul Bhatia [IPAX]

On 07/07/2010 01:12 AM, Lars Ellenberg wrote:
>> -let sw=sw+1
>> > +  sw=$((sw+1))
> all of these need to be sw=$(( $sw + 1 ))
> and similar.
> 
> ok, so it also works like this in "recent" dash.

oh... i wasn't aware of that. i used checkbashism on debian and made
the changes in reference to https://wiki.ubuntu.com/DashAsBinSh

i didn't know that they wouldn't be as compatible as possible.

thanks for pointing this out!
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] [PATCH] suggested bashism fixes for hb2openais.sh

2010-07-06 Thread Raoul Bhatia [IPAX]

# HG changeset patch
# User Raoul Bhatia [IPAX] 
# Date 1278427473 -7200
# Branch stable-1.0
# Node ID 6396b06964a167a53b57b80ab316c96c9de3ab39
# Parent  31401399d6334467296a60a13d0cea7641fc9358
suggested bashism fixes for hb2openais.sh

diff -r 31401399d633 -r 6396b06964a1 tools/hb2openais.sh.in
--- a/tools/hb2openais.sh.inTue Jul 06 16:29:38 2010 +0200
+++ b/tools/hb2openais.sh.inTue Jul 06 16:44:33 2010 +0200
@@ -384,10 +384,10 @@
 newstanza() {
do_tabs
printf "%s {\n" $1
-   let sw=sw+1
+   sw=$((sw+1))
 }
 endstanza() {
-   let sw=sw-1
+   sw=$((sw-1))
do_tabs
printf "}\n"
 }
@@ -441,7 +441,7 @@
fi
netaddress="`netaddress $iface`"
if [ "$netaddress" ]; then
-   let local_mcastport=$local_mcastport+1
+   local_mcastport=$((local_mcastport+1))
newportinfo $iface $local_mcastport
echo "$netaddress" "$mcastaddr" "$local_mcastport"
else
@@ -526,7 +526,7 @@
multicastinfo $ring $addr $port
setvalue mcastport $port
setvalue mcastaddr $addr
-   let ring=$ring+1
+   ring=$((ring+1))
endstanza
 done
 mediacnt=`gethbmedia 2>/dev/null | prochbmedia 2>/dev/null | sort -u | wc -l`
@@ -791,7 +791,7 @@
(cd / && tar cf - $DIST_FILES) |
ssh $ssh_opts $node "rm -f $REMOTE_RM_FILES &&
cd / && tar xf -"
-   let rc=$rc+$?
+   rc=$((rc+$?))
fi
 done
 info "Done transfering files"

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] [PATCH] suggested bashism fixes for HealthSMART OCF RA

2010-07-06 Thread Raoul Bhatia [IPAX]

# HG changeset patch
# User Raoul Bhatia [IPAX] 
# Date 1278426578 -7200
# Branch stable-1.0
# Node ID 31401399d6334467296a60a13d0cea7641fc9358
# Parent  338113649a70f80fe89ac0765035a79f70cb202f
suggested bashism fixes for HealthSMART OCF RA

diff -r 338113649a70 -r 31401399d633 extra/resources/HealthSMART
--- a/extra/resources/HealthSMART   Mon Jul 05 14:25:54 2010 +0200
+++ b/extra/resources/HealthSMART   Tue Jul 06 16:29:38 2010 +0200
@@ -116,7 +116,7 @@
 lower_yellow_limit=5
   else
 lower_red_limit=${OCF_RESKEY_temp_lower_limit}
-let lower_yellow_limit=${OCF_RESKEY_temp_lower_limit}+5
+lower_yellow_limit=$((OCF_RESKEY_temp_lower_limit+5))
   fi
 
   if [ "x${OCF_RESKEY_temp_upper_limit}" = "x" ] ; then
@@ -124,7 +124,7 @@
 upper_yellow_limit=55
   else
 upper_red_limit=${OCF_RESKEY_temp_upper_limit}
-let upper_yellow_limit=${OCF_RESKEY_temp_upper_limit}-5
+upper_yellow_limit=$((OCF_RESKEY_temp_upper_limit-5))
   fi
 
   if [ "x${OCF_RESKEY_drives}" = "x" ] ; then
@@ -195,25 +195,25 @@
  #
  TEMP=`$SMARTCTL -A /dev/sda | awk '/^194/ { print $10 }'` 
  echo "Temp = "$TEMP
- if [[ ${TEMP} -lt ${lower_red_limit} ]] ; then
+ if [ ${TEMP} -lt ${lower_red_limit} ] ; then
ocf_log info "Drive /dev/sda too cold."
attrd_updater -n "#health-smart" -U "red" -d "5s"
return $OCF_SUCCESS
  fi
 
- if [[ $TEMP -gt ${upper_red_limit} ]] ; then
+ if [ $TEMP -gt ${upper_red_limit} ] ; then
ocf_log info "Drive /dev/sda too hot."
attrd_updater -n "#health-smart" -U "red" -d "5s"
return $OCF_SUCCESS
  fi
 
- if [[ $TEMP -lt ${lower_yellow_limit} ]] ; then
+ if [ $TEMP -lt ${lower_yellow_limit} ] ; then
 ocf_log info "Drive /dev/sda quite cold."
 attrd_updater -n "#health-smart" -U "yellow" -d "5s"
 return $OCF_SUCCESS
   fi
 
-  if [[ $TEMP -gt ${upper_yellow_limit} ]] ; then
+  if [ $TEMP -gt ${upper_yellow_limit} ] ; then
 ocf_log info "Drive /dev/sda quite hot."
 attrd_updater -n "#health-smart" -U "yellow" -d "5s"
 return $OCF_SUCCESS

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [PATCH] suggested bashism fixes

2010-07-06 Thread Raoul Bhatia [IPAX]

On 07/06/2010 04:15 PM, Raoul Bhatia [IPAX] wrote:
> # HG changeset patch
> # User Raoul Bhatia [IPAX] 
> # Date 1278425714 -7200
> # Branch stable-1.0
> # Node ID b914d11e1e165fc0b559ebbfb70db0014da0c250
> # Parent  338113649a70f80fe89ac0765035a79f70cb202f
> suggested bashism fixes

i think i made a mistake and
  lower_yellow_limit=$(({OCF_RESKEY_temp_lower_limit}+5))

should actually read
  lower_yellow_limit=$((OCF_RESKEY_temp_lower_limit+5))


i will resubmit the patch and hopefully, someone is able to try it?

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] [PATCH] suggested bashism fixes

2010-07-06 Thread Raoul Bhatia [IPAX]

# HG changeset patch
# User Raoul Bhatia [IPAX] 
# Date 1278425714 -7200
# Branch stable-1.0
# Node ID b914d11e1e165fc0b559ebbfb70db0014da0c250
# Parent  338113649a70f80fe89ac0765035a79f70cb202f
suggested bashism fixes

diff -r 338113649a70 -r b914d11e1e16 extra/resources/HealthSMART
--- a/extra/resources/HealthSMART   Mon Jul 05 14:25:54 2010 +0200
+++ b/extra/resources/HealthSMART   Tue Jul 06 16:15:14 2010 +0200
@@ -116,7 +116,7 @@
 lower_yellow_limit=5
   else
 lower_red_limit=${OCF_RESKEY_temp_lower_limit}
-let lower_yellow_limit=${OCF_RESKEY_temp_lower_limit}+5
+lower_yellow_limit=$(({OCF_RESKEY_temp_lower_limit}+5))
   fi
 
   if [ "x${OCF_RESKEY_temp_upper_limit}" = "x" ] ; then
@@ -124,7 +124,7 @@
 upper_yellow_limit=55
   else
 upper_red_limit=${OCF_RESKEY_temp_upper_limit}
-let upper_yellow_limit=${OCF_RESKEY_temp_upper_limit}-5
+upper_yellow_limit=$(({OCF_RESKEY_temp_upper_limit}-5))
   fi
 
   if [ "x${OCF_RESKEY_drives}" = "x" ] ; then
@@ -195,25 +195,25 @@
  #
  TEMP=`$SMARTCTL -A /dev/sda | awk '/^194/ { print $10 }'` 
  echo "Temp = "$TEMP
- if [[ ${TEMP} -lt ${lower_red_limit} ]] ; then
+ if [ ${TEMP} -lt ${lower_red_limit} ] ; then
ocf_log info "Drive /dev/sda too cold."
attrd_updater -n "#health-smart" -U "red" -d "5s"
return $OCF_SUCCESS
  fi
 
- if [[ $TEMP -gt ${upper_red_limit} ]] ; then
+ if [ $TEMP -gt ${upper_red_limit} ] ; then
ocf_log info "Drive /dev/sda too hot."
attrd_updater -n "#health-smart" -U "red" -d "5s"
return $OCF_SUCCESS
  fi
 
- if [[ $TEMP -lt ${lower_yellow_limit} ]] ; then
+ if [ $TEMP -lt ${lower_yellow_limit} ] ; then
 ocf_log info "Drive /dev/sda quite cold."
 attrd_updater -n "#health-smart" -U "yellow" -d "5s"
 return $OCF_SUCCESS
   fi
 
-  if [[ $TEMP -gt ${upper_yellow_limit} ]] ; then
+  if [ $TEMP -gt ${upper_yellow_limit} ] ; then
 ocf_log info "Drive /dev/sda quite hot."
 attrd_updater -n "#health-smart" -U "yellow" -d "5s"
 return $OCF_SUCCESS

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Lighty doesn't come up always

2010-07-06 Thread Raoul Bhatia [IPAX]

On 07/06/2010 10:28 AM, Torsten Bronger wrote:
> Hallöchen!
> 
> We have a two-node cluster with a virtual IP and Lighty running on
> that node which has this IP currently.  Thus, our configuration
> says:
> 
> node $id="xxx" mandy
> node $id="yyy" olga
> primitive Public-IP ocf:heartbeat:IPaddr2 \
> params ip="134.94.252.127" broadcast="134.94.253.255" nic="eth1" 
> cidr_netmask="23" \
> op monitor interval="60s"
> primitive lighty lsb:lighttpd \
> op monitor interval="60s" timeout="30s" on-fail="restart" \
> op start interval="0" timeout="60s" \
> meta migration-threshold="3" failure-timeout="30s" 
> target-role="Started"

is your lsb script *really* lsb compliant?
please refer to
http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-lsb.html

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Upgraded mysql from 5.0 to 5.1

2010-07-03 Thread Raoul Bhatia [IPAX]

On 07/02/2010 11:18 PM, Jake Bogie wrote:
> Receiving this error when running crm status...the only thing I can
> think of is that since the upgrade to MySQL 5.1 the service name was
> changed from mysqld to mysql.
> 
> Any thoughts on how to correct it?

what i would do:

1. crm resource stop mysqld
2. crm resource cleanup mysqld

3. crm configure edit mysqld
-> rename from lsb:mysqld to lsb:mysql

4. crm resource start mysqld

actually, i would not do 3 but would switch over to the ocf script
that is provided by heartbeat/pacemaker:

primitive mysql-server ocf:heartbeat:mysql \
op monitor interval="30s" timeout="30s" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120" \
params config="/etc/mysql/my.cnf" datadir="/data/db/mysql/data/"
socket="/var/run/mysqld/mysqld.sock" binary="/usr/sbin/mysqld"
pid="/var/run/mysqld/mysqld.pid"

(the "params line" should be one single line)

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] /.crm_help_index file (in system root aka /)

2010-07-01 Thread Raoul Bhatia [IPAX]

On 07/01/2010 05:46 PM, Dejan Muhamedagic wrote:
> The help index is created in the user's home. Is that the home of
> the root user? Shouldn't it be /root? BTW, there are many other
> programs creating dot files in home dirs.

yeah, i know. and /root/. would perfectly be fine.

but occasionally, it appears in /

thats the "issue".

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] /.crm_help_index file (in system root aka /)

2010-07-01 Thread Raoul Bhatia [IPAX]

hi,

sometimes, i see a /.crm_help_index file being created on my system(s).
i do not exactly know when this happens, but i get the feeling that this
is not the correct place for this file ;)

running pacemaker 1.0.8+hg15494-4 on debian squeeze

> Current DC:  - partition with quorum
> Version: 1.0.8-f2ca9dd92b1d+ sid tip

quite possible that this issue has already been resolved.

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] i stop mysql service but the crm status is still runing

2010-06-07 Thread Raoul Bhatia [IPAX]


hi,

once again i encourage you to use the ocf:heartbeat:mysql
script.

moreover, i guess you did not configure any "monitor" operations that
would periodically check the mysql server's status.

did you read 
http://www.clusterlabs.org/wiki/Load_Balanced_MySQL_Replicated_Cluster ?


there is some information on how to monitor msyql in there
(note: i did not read the entire article)

cheers,
raoul

On 07.06.2010 03:15, ch huang wrote:

mysql is running ,and crm status output is


Last updated: Sat Jun  5 09:48:58 2010
Stack: openais
Current DC: PRIM - partition with quorum
Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
2 Nodes configured, 2 expected votes
2 Resources configured.


Online: [ PRIM SEC ]

  Resource Group: mysql
  fs_mysql   (ocf::heartbeat:Filesystem):Started PRIM
  ip_mysql   (ocf::heartbeat:IPaddr2):   Started PRIM
  mysqld (lsb:mysqld):   Started PRIM
  Master/Slave Set: ms_drbd_mysql
  Masters: [ PRIM ]
  Slaves: [ SEC ]

and i finished the mysql by

#service mysqld stop
Stopping MySQL:[  OK  ]
# service mysqld status
mysqld is stopped

but in the crm status output , mysql still in running ,i do not
understand why?


Last updated: Sat Jun  5 09:48:58 2010
Stack: openais
Current DC: PRIM - partition with quorum
Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
2 Nodes configured, 2 expected votes
2 Resources configured.


Online: [ PRIM SEC ]

  Resource Group: mysql
  fs_mysql   (ocf::heartbeat:Filesystem):Started PRIM
  ip_mysql   (ocf::heartbeat:IPaddr2):   Started PRIM
  mysqld (lsb:mysqld):   Started PRIM
  Master/Slave Set: ms_drbd_mysql
  Masters: [ PRIM ]
  Slaves: [ SEC ]

and here is my configure

# crm
crm(live)# configure
crm(live)configure# show
node PRIM
node SEC
primitive drbd_mysql ocf:linbit:drbd \
 params drbd_resource="r1" \
 op monitor interval="15s"
primitive fs_mysql ocf:heartbeat:Filesystem \
 params device="/dev/drbd/by-res/r1" directory="/drbddata/"
fstype="ext3"
primitive ip_mysql ocf:heartbeat:IPaddr2 \
 params ip="192.168.76.227" nic="eth0"
primitive mysqld lsb:mysqld
group mysql fs_mysql ip_mysql mysqld
ms ms_drbd_mysql drbd_mysql \
 meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
property $id="cib-bootstrap-options" \
 no-quorum-policy="ignore" \
 stonith-enabled="false" \
 expected-quorum-votes="2" \
 dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
 cluster-infrastructure="openais" \
 default-action-timeout="240s"



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] NODE offline after restart host

2010-06-05 Thread Raoul Bhatia [IPAX]

On 06/05/2010 11:00 AM, ch huang wrote:
> i try to stop corosync but the info i get is
> 
> waiting for corosync services to unload (many many dot),i have
> no patient to see the endless dot ,so i restart my host use "init 6"
> ,Dots appear again,so i cut off the power and restart
> 
> but when i start corosync, the nodes in cluster is offline ,i do not
> know how to get my cluster online again ,so anyone can help??

is corosync started in your runlevel? which distribution do you use?

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] ERROR: lsb:mysql: no such resource agent

2010-06-05 Thread Raoul Bhatia [IPAX]

On 06/05/2010 10:53 AM, ch huang wrote:
> i figure it out,guess what? the document here
> http://www.clusterlabs.org/wiki/DRBD_MySQL_HowTosay
> 
> primitive mysqld lsb:mysql
> 
> but i use crm check the resource on lsb,this is lsb:mysqld not lsb:mysql 
> 
> 
> that's why i get this error!! i have been misleaded!!

hi,

lsb scripts actually take *any* script from /etc/init.d/

it entirely depends on the distribution if they ship /etc/init.d/mysql
*or* /etc/init.d/mysqld *or* anything else (e.g. upstart scripts).

please refer to [1] for more information.

however, i strongly encourage you to use the mysql ocf resource agent.

cheers,
raoul
[1]
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-resource-lsb.html
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] ERROR: lsb:mysql: no such resource agent

2010-06-05 Thread Raoul Bhatia [IPAX]

On 06/05/2010 09:22 AM, ch huang wrote:
> hi ,all ,i install pacemaker,now have problem in configure
> 
> i define a new cib fs ,and when i define new mysql resource ,got an
> error ,i do not know why, is it something missing?
> crm(fs)# configure primitive mysqld lsb:mysql
> lrmadmin[15787]: 2010/06/03_13:43:09 ERROR:
> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
> message of rmetadata with function get_ret_from_msg.

what does ls /etc/init.d/mysql* say?

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] pengine self-maintenance

2010-05-17 Thread Raoul Bhatia [IPAX]

On 05/17/2010 08:52 AM, Andrew Beekhof wrote:
> On Sun, May 16, 2010 at 1:09 AM, Vadym Chepkov  wrote:
>> Hi
>>
>> I noticed pengine (pacemaker-1.0.8-6.el5) creates quite a lot of files in
>> /var/lib/pengine,
>> especially when cluster-recheck-interval is set to enable failure-timeout
>> checks.
> 
> pengine metadata | grep series-max
> 
>> /var/lib/heartbeat/crm/ seems also growing unattended.
> 
> Unless there is a bug somewhere, it should be storing only the last
> 100 configurations.

when chaning this value at runtime, how long will pacemaker take to
honor the new value?

as for one of my clusters, i changed pe-error-series-max,
pe-warn-series-max and pe-input-series-max from -1 too 100 and
am already waiting a couple of minutes for the files to disappear:

> r...@wc01 ~ # grep pe-error-series-max /var/log/syslog
> May 17 10:29:15 wc01 cib: [9954]: info: log_data_element: cib:diff: - 
> 
> May 17 10:29:15 wc01 cib: [9954]: info: log_data_element: cib:diff: + 
> 
> r...@wc01 ~ # find /var/lib/pengine -type f|wc -l
> 128435
> r...@wc01 ~ # date
> Mon May 17 10:32:06 CEST 2010

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] Is it possible for ocf:heartbeat:IPaddr2 to be on different NICs?

2010-04-23 Thread Raoul Bhatia [IPAX]


On 23.04.2010 22:22, daniel qian wrote:

I found there is a function called check_binary in 
/usr/lib/ocf/resource.d/heartbeat/IPaddr2. the error message  "Couldnt fnd utility 
ip" is probably produced by this function. But I was not able to locate this 
function. Any one knows which file this function is in?


hi daniel,

please take a look at
/usr/lib/ocf/resource.d/heartbeat/.ocf-binaries

cheers,
raoul

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] Please use the 'true' mailing list address

2010-02-10 Thread Raoul Bhatia [IPAX]


On 10.02.2010 13:51, Andrew Beekhof wrote:

Hi,

It seems some people are still using the old address for the mailing
list (ie. pacema...@clusterlabs.org).
Its not a big deal, but if everyone updated their address books it
would cut down on duplicate posts (when people do reply-all).


what about returning some informative text if one uses the old address
instead of forwarding the email to the list?

cheers,

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

[Pacemaker] strange openais-legacy/pacemaker issue

2010-02-09 Thread Raoul Bhatia [IPAX]

hi,

i today stumbled accross a strange openais-legacy/pacemaker issue.
i tried to upgrade my kernel and found myself stuck with aisexec
runing in state "D" (uninteruptible io).

testing serveral other kernels on the same maschine gave the following
result:

2.6.26-2-amd64, debian lenny kernel, *working* [1]
2.6.30-bpo.1-amd64, debian lenny-backports kernel, *working* [2]
2.6.27.45, selfmade, *not* working [3][4]

i then diff'ed the configuration and tried to update my 2.6.27.45
configuration to match the debian config more closely. after a
successful compile and reboot, everything is working. [5][6]

i'm left, a little puzzled, with no clue on what the issue might have
been. maybe you can spot the error more quickly?

cheers,
raoul

[1] http://ip52.ipax.at/~raoul/cluster/ok-2.6.26-2-amd64.log
[2] http://ip52.ipax.at/~raoul/cluster/ok-2.6.30-bpo.1-amd64.log

[3] http://ip52.ipax.at/~raoul/cluster/nok.2.6.27.45.log
[4] http://ip52.ipax.at/~raoul/cluster/nok.config-2.6.27.45

[5] http://ip52.ipax.at/~raoul/cluster/ok.config-2.6.27.45
[6] http://ip52.ipax.at/~raoul/cluster/config.diff
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Debian packages of the clusterstack updated: Bugreport, No 2

2010-01-27 Thread Raoul Bhatia [IPAX]

On 01/27/2010 05:41 PM, Michael Schwartzkopff wrote:
> Where do I get your old packages?

# apt-cache policy pacemaker
pacemaker:
  Installed: (none)
  Candidate: 1.0.7+hg20100127-0test1~bpo50+1
  Version table:
 1.0.7+hg20100127-0test1~bpo50+1 0
 50 http://people.debian.org lenny/main Packages
 1.0.6+hg20091102-4~bpo50+1 0
 50 http://people.debian.org lenny/main Packages


so the old packages should still be there, right?

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] SLES 11 ocf:heartbeat:drbd versus ocf:linbit:drbd

2010-01-27 Thread Raoul Bhatia [IPAX]

On 01/27/2010 11:26 AM, Oliver Ladner wrote:
> Hello,
> 
> Can someone please tell me the difference between the resource agents 
> ocf:heartbeat:drbd and ocf:linbit:drbd?
> 
> I'm using the legacy heartbeat:drbd agent at the moment, as drbd on SLES 11 
> is only at 8.2.7. Problem is that the two-node cluster only moves the drbd 
> resource when the active node is manually set to standby, but not when an 
> error occures (shutdown etc.). So I wondered if that problem is related to 
> using ocf:heartbeat:drbd.

ocf:heartbeat:drbd is legacy and only there because ppl still use it.

ocf:linbit:drbd is created and maintained by linbit, the company behind
drbd. this one is recommended for recent drbd releases (and drbd
recent packages should also ship with this one)

i cannot comment if this one works with 8.2.7 thou.
i'm using it with 8.3.x only.

cheers,
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

1 2 >

1 - 100 of 185 matches

Mail list logo