Re: [Linux-ha-dev] patch: RA conntrackd: Request state info on startup

2012-07-24 Thread Dominik Klein
 currently doing another conntrackd project and therefore using the
 Found a minor issue:

 When the active host is fenced and returns to the cluster, it does not
 request the current connection tracking states. Therefore state
 information might be lost. This patch fixes that. Any comments?

 I'm not sure what do you mean by active host. A node which is
 running conntrackd or a node which is running conntrackd master
 instance?

Erm, yeah. Sorry for not being precise. I mean the node running the
master instance.

 Successfully tested with debian squeeze version 0.9.14.

 Looks OK to me. I'll push it to the repository.

Thanks.

Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] patch: RA conntrackd: Request state info on startup

2012-07-18 Thread Dominik Klein
Hi people

currently doing another conntrackd project and therefore using the
code once again (jippie :)). Found a minor issue:

When the active host is fenced and returns to the cluster, it does not
request the current connection tracking states. Therefore state
information might be lost. This patch fixes that. Any comments?

Successfully tested with debian squeeze version 0.9.14.

Regards
Dominik


conntrackd.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-28 Thread Dominik Klein
 There did not have to be a negative location constraint up to now,
 because the cluster took care of that.
 
 Only because it didn't work correctly.

Okay.

 Actually, this is a wanted setup. It happened that VMs configs were
 changed in ways that lead to a VM not being startable any more. For that
 case, they wanted to be able to start the old config on the other node.

Please, notice _they_ vs. _me_ here :)

 Wow! So, they can have different configurations at different
 nodes.

Agreed, wow!

 The only issue you may have with this cluster is if the
 administrator erronously removes a config on some node, right?
 And that then some time afterwards the cluster does a probe on
 that node. And then again the cluster wants to fail over this VM
 to that node. And that at this point in time no other node can
 run this VM and that it is going to repeatedly try to start and
 fail. And that failed start is fatal isn't configured. No doubt
 that this could happen, but what's the probability? And, finally,
 that doesn't look like a well maintained cluster.

I guess this is something _they_ have to live with then.

At first glance, I honestly thought this was a change in the agent that
introduced a regression that not only this configuration would hit, but
you made me realize that it does not, but that it does improve the agent
for sane setups.

My vote goes for your patch, ie stop  no config = return SUCCESS

Thanks
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-27 Thread Dominik Klein
On 06/27/2011 11:09 AM, Dejan Muhamedagic wrote:
 Hi Dominik,
 
 On Fri, Jun 24, 2011 at 03:50:40PM +0200, Dominik Klein wrote:
 Hi Dejan,

 this way, the cluster never learns that it can't start a resource on 
 that node.
 
 This resource depends on shared storage. So, the cluster won't
 try to start it unless the shared storage resource is already
 running. This is something that needs to be specified using
 either a negative preference location constraint or asymmetrical
 cluster. There's no need for yet another mechanism (the extra
 parameter) built into the resource agent. It's really an
 overkill.

As requested on IRC, I describe my setup and explain why I think this is
a regression.

2 node cluster with a bunch of drbd devices.

Each /dev/drbdXX is used as a block device of a VM. The VMs
configuration files are not on shared storage but have to be copied
manually.

So it happened that during configuration of a VM, the admin forgot to
copy the configuration file to node2. The machine's DRBD was configured
though. So the cluster decided to promote the VMs DRBD on node2 and then
start the master-colocated and ordered VM.

With the agent before the mentioned patch, during probe of a newly
configured resource, the cluster would have learned that the VM is not
available on one of the nodes (ERR_INSTALLED), so it would never start
the resource there.

Now it sees NOT_RUNNING on all nodes during probe and may decide to
start the VM on a node where it cannot run. That, with the current
version of the agent, leads to a failed start, a failed stop during
recovery and therefore: an unnecessary stonith operation.

With Dejan's patch, it would still see NOT_RUNNING during probe, but at
least the stop would succeed. So the difference to the old version would
be that we had an unnecessary failed start on the node that does not
have the VM but it would not harm the node and I'd be fine with applying
that patch.

There's a case though that might stop the vm from running (for an amount
of time). And that is if start-failure-is-fatal is false. Then we would
have $migration-threshold failed start/succeeded stop iterations while
the VMs service would not be running.

Of course I do realize that the initial fault is a human one. but the
cluster used to protect from this, does not any more and that's why I
think this is a regression.

I think the correct way to fix this is to still return ERR_INSTALLED
during probe unless the cluster admin configures that the VMs config is
on shared storage. Finding out about resource states on different nodes
is what the probe was designed to do, was it not? And we work around
that in this resource agent just to support certain setups.

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-27 Thread Dominik Klein
 With the agent before the mentioned patch, during probe of a newly
 configured resource, the cluster would have learned that the VM is not
 available on one of the nodes (ERR_INSTALLED), so it would never start
 the resource there.
 
 This is exactly the problem with shared storage setups, where
 such an exit code can prevent resource from ever being started on
 a node which is otherwise perfectly capable of running that
 resource.

I see and understand that that, too, is a valid setup and concern.

 But really, if a resource can _never_ run on a node, then there
 should be a negative location constraint or the cluster should be
 setup as asymmetrical. 

There did not have to be a negative location constraint up to now,
because the cluster took care of that.

 Now, I understand that in your case, it is
 actually due to the administrator's fault.

Yes, that's how I noticed the problem with the agent.

 This particular setup is a special case of shared storage. The
 images are on shared storage, but the configurations are local. I
 think that you really need to make sure that the configurations
 are present where they need to be. Best would be that the
 configuration is kept on the storage along with the corresponding
 VM image. Since you're using a raw device as image, that's
 obviously not possible. Otherwise, use csync2 or similar to keep
 files in sync.

Actually, this is a wanted setup. It happened that VMs configs were
changed in ways that lead to a VM not being startable any more. For that
case, they wanted to be able to start the old config on the other node.

I agree that the cases that lead me to finding this change in the agent
are cases that could have been solved with better configuration and that
your suggestions make sense. Still, I feel that the change introduces a
new way of doing things that might affect running and working setups in
unintended ways. I refuse to believe that I am the only one doing HA VMs
like this (although of course I might be wrong on that, too ...).

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-26 Thread Dominik Klein
I'm not sure my fix is correct.

According to

https://github.com/ClusterLabs/resource-agents/commit/96ff8e9ad3d4beca7e063beef156f3b838a798e1#heartbeat/VirtualDomain

this is a regression which was introduced in April '11.

So the fix should be the other way around: Introduce a parameter that
let's the user configure the config file _is_ on shared storage and if
this is false or unset, return to the old behaviour of returning
ERR_INSTALLED.

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-24 Thread Dominik Klein
This fixes the issue described yesterday.

Comments?

Regards
Dominik
exporting patch:
# HG changeset patch
# User Dominik Klein dominik.kl...@gmail.com
# Date 1308909599 -7200
# Node ID 2b1615aaca2c90f2f4ab93eb443e5902906fb28a
# Parent  7a11934b142d1daf42a04fbaa0391a3ac47cee4c
RA VirtualDomain: Fix probe if config is not on shared storage

diff -r 7a11934b142d -r 2b1615aaca2c heartbeat/VirtualDomain
--- a/heartbeat/VirtualDomain	Fri Feb 25 12:23:17 2011 +0100
+++ b/heartbeat/VirtualDomain	Fri Jun 24 11:59:59 2011 +0200
@@ -19,9 +19,11 @@
 # Defaults
 OCF_RESKEY_force_stop_default=0
 OCF_RESKEY_hypervisor_default=$(virsh --quiet uri)
+OCF_RESKEY_config_on_shared_storage_default=1
 
 : ${OCF_RESKEY_force_stop=${OCF_RESKEY_force_stop_default}}
 : ${OCF_RESKEY_hypervisor=${OCF_RESKEY_hypervisor_default}}
+: ${OCF_RESKEY_config_on_shared_storage=${OCF_RESKEY_config_on_shared_storage_default}}
 ###
 
 ## I'd very much suggest to make this RA use bash,
@@ -421,8 +423,8 @@
 # check if we can read the config file (otherwise we're unable to
 # deduce $DOMAIN_NAME from it, see below)
 if [ ! -r $OCF_RESKEY_config ]; then
-	if ocf_is_probe; then
-	ocf_log info Configuration file $OCF_RESKEY_config not readable during probe.
+	if ocf_is_probe  ocf_is_true $OCF_RESKEY_config_on_shared_storage; then
+	ocf_log info Configuration file $OCF_RESKEY_config not readable during probe. Assuming it is on shared storage and therefore reporting VM is not running.
 	else
 	ocf_log error Configuration file $OCF_RESKEY_config does not exist or is not readable.
 	return $OCF_ERR_INSTALLED
exporting patch:
# HG changeset patch
# User Dominik Klein dominik.kl...@gmail.com
# Date 1308911272 -7200
# Node ID 312adf2449eb59dcc41686626b1726428d13227b
# Parent  2b1615aaca2c90f2f4ab93eb443e5902906fb28a
RA VirtualDomain: Add metadata for the new parameter

diff -r 2b1615aaca2c -r 312adf2449eb heartbeat/VirtualDomain
--- a/heartbeat/VirtualDomain   Fri Jun 24 11:59:59 2011 +0200
+++ b/heartbeat/VirtualDomain   Fri Jun 24 12:27:52 2011 +0200
@@ -119,6 +119,16 @@
 content type=string default= /
 /parameter
 
+parameter name=config_on_shared_storage unique=0 required=0
+longdesc lang=en
+If your VMs configuration file is _not_ on shared storage, so that the config 
+file not being in place during a probe means that the VM is not 
installed/runnable
+on that node, set this to 0.
+/longdesc
+shortdesc lang=enSet to 0 if your VMs config file is not on shared 
storage/shortdesc
+content type=boolean default=1 /
+/parameter
+
 /parameters
 
 actions
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-24 Thread Dominik Klein
Hi Dejan,

this way, the cluster never learns that it can't start a resource on 
that node.

I don't consider this a solution.

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] VirtualDomain issue

2011-06-22 Thread Dominik Klein
Hi

code snippet from
http://hg.linux-ha.org/agents/raw-file/7a11934b142d/heartbeat/VirtualDomain
(which I believe is the current version)

VirtualDomain_Validate_All() {
snip
 if [ ! -r $OCF_RESKEY_config ]; then
if ocf_is_probe; then
ocf_log info Configuration file $OCF_RESKEY_config not readable
during probe.
else
ocf_log error Configuration file $OCF_RESKEY_config does not exist
or is not readable.
return $OCF_ERR_INSTALLED
fi
 fi
}
snip
VirtualDomain_Validate_All || exit $?
snip
if ocf_is_probe  [ ! -r $OCF_RESKEY_config ]; then
 exit $OCF_NOT_RUNNING
fi

So, say one node does not have the config, but the cluster decides to
run the vm on that node. The probe returns NOT_RUNNING, so the cluster
tries to start the vm, that start returns ERR_INSTALLED, the cluster has
to try to recover from the start failure, so stop it, but that stop op
returns ERR_INSTALLED as well, so we need to be stonith'd.

I think this is wrong behaviour. I read the comments about
configurations being on shared storage which might not be available at
certain points in time and I see the point. But the way this is
implemented clearly does not work for everybody. I vote for making this
configurable. Unfortunately, due to several reasons, I am not able to
contribute this patch myself at the moment.

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] New OCF RA: symlink

2011-04-21 Thread Dominik Klein
 Am I too paranoid?

I don't think you are. Some non-root pratically being able to remove any
file is certainly a valid concern.

Thing is: I needed an RA that configured a cronjob. Florian suggested
writing the symlink RA instead, that could manage symlink. Apparently
there was an IRC discussion a couple weeks ago that I was not a part of.

So while the symlink RA could also do what I needed, I tried to write
that instead of the cronjob RA (which will also come since it will cover
some more functions than this one, but that's another story).

So anyway, maybe those involved in the first discussion can comment on
this, too and share thoughts on how to solve things. Maybe they had
already addressed these situations.

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] libglue2 dependency missing in cluster-glue

2011-03-17 Thread Dominik Klein
Mornin Dejan,

 The reason was that libglue2 and cluster-glue were not installed from
 the clusterlabs repository, as the rest of the packages were, but
 instead they were pulled from the original opensuse repository in an
 older version.

 This is what I found in pacemaker.spec.in in the repository:

 Requires(pre):  cluster-glue = 1.0.6

 Which version of glue was that older version?

 0.9.1
 
 Whoa. Can't recall ever seeing that thing.

rpm -qRp cluster-glue-0.9-2.1.x86_64.rpm
/usr/sbin/groupadd
/usr/bin/getent
/usr/sbin/useradd
/bin/sh
rpmlib(PayloadFilesHavePrefix) = 4.0-1
rpmlib(CompressedFileNames) = 3.0.4-1
/bin/bash
/bin/sh
/usr/bin/perl
/usr/bin/python
libc.so.6()(64bit)
libc.so.6(GLIBC_2.2.5)(64bit)
libc.so.6(GLIBC_2.4)(64bit)
libcurl.so.4()(64bit)
libglib-2.0.so.0()(64bit)
liblrm.so.2()(64bit)
libnetsnmp.so.15()(64bit)
libpils.so.2()(64bit)
libplumb.so.2()(64bit)
libplumbgpl.so.2()(64bit)
libstonith.so.1()(64bit)
libxml2.so.2()(64bit)
rpmlib(PayloadIsLzma) = 4.4.6-1

That's the old package, from opensuse.

Here's the new one (106 from clusterlabs' opensuse 11.2 repository):

/usr/sbin/groupadd
/usr/bin/getent
/usr/sbin/useradd
/bin/sh
/bin/sh
/bin/sh
/bin/sh
rpmlib(PayloadFilesHavePrefix) = 4.0-1
rpmlib(CompressedFileNames) = 3.0.4-1
/bin/bash
/bin/sh
/usr/bin/env
/usr/bin/perl
/usr/bin/python
libOpenIPMI.so.0()(64bit)
libOpenIPMIposix.so.0()(64bit)
libOpenIPMIutils.so.0()(64bit)
libbz2.so.1()(64bit)
libc.so.6()(64bit)
libc.so.6(GLIBC_2.2.5)(64bit)
libc.so.6(GLIBC_2.4)(64bit)
libcrypto.so.0.9.8()(64bit)
libcurl.so.4()(64bit)
libdl.so.2()(64bit)
libglib-2.0.so.0()(64bit)
liblrm.so.2()(64bit)
libltdl.so.7()(64bit)
libm.so.6()(64bit)
libnetsnmp.so.15()(64bit)
libopenhpi.so.2()(64bit)
libpils.so.2()(64bit)
libplumb.so.2()(64bit)
libplumbgpl.so.2()(64bit)
librt.so.1()(64bit)
libstonith.so.1()(64bit)
libuuid.so.1()(64bit)
libxml2.so.2()(64bit)
libz.so.1()(64bit)
rpmlib(PayloadIsLzma) = 4.4.6-1

I don't see libglue there.

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] libglue2 dependency missing in cluster-glue

2011-03-17 Thread Dominik Klein
 This is what I found in pacemaker.spec.in in the repository:
 
 Requires(pre):  cluster-glue = 1.0.6

The 1.0.10 rpm from clusterlabs for opensuse 11.2 just says
cluster-glue afaict:

rpm -qR pacemaker
cluster-glue
resource-agents
python = 2.4
libpacemaker3 = 1.0.10-1.4
libesmtp
net-snmp
rpmlib(PayloadFilesHavePrefix) = 4.0-1
rpmlib(CompressedFileNames) = 3.0.4-1
/bin/bash
/bin/sh
/usr/bin/env
/usr/bin/python
libbz2.so.1()(64bit)
libc.so.6()(64bit)
libc.so.6(GLIBC_2.2.5)(64bit)
libc.so.6(GLIBC_2.3)(64bit)
libc.so.6(GLIBC_2.4)(64bit)
libccmclient.so.1()(64bit)
libcib.so.1()(64bit)
libcoroipcc.so.4()(64bit)
libcrmcluster.so.1()(64bit)
libcrmcommon.so.2()(64bit)
libcrypt.so.1()(64bit)
libcrypto.so.0.9.8()(64bit)
libdl.so.2()(64bit)
libesmtp.so.5()(64bit)
libgcrypt.so.11()(64bit)
libglib-2.0.so.0()(64bit)
libgnutls.so.26()(64bit)
libgnutls.so.26(GNUTLS_1_4)(64bit)
libgpg-error.so.0()(64bit)
libhbclient.so.1()(64bit)
liblrm.so.2()(64bit)
libltdl.so.7()(64bit)
libm.so.6()(64bit)
libncurses.so.5()(64bit)
libnetsnmp.so.15()(64bit)
libnetsnmpagent.so.15()(64bit)
libnetsnmphelpers.so.15()(64bit)
libnetsnmpmibs.so.15()(64bit)
libpam.so.0()(64bit)
libpam.so.0(LIBPAM_1.0)(64bit)
libpe_rules.so.2()(64bit)
libpe_status.so.2()(64bit)
libpengine.so.3()(64bit)
libperl.so()(64bit)
libpils.so.2()(64bit)
libplumb.so.2()(64bit)
libpopt.so.0()(64bit)
libpthread.so.0()(64bit)
libpthread.so.0(GLIBC_2.2.5)(64bit)
librpm.so.0()(64bit)
librpmio.so.0()(64bit)
librt.so.1()(64bit)
libsensors.so.3()(64bit)
libstonith.so.1()(64bit)
libstonithd.so.0()(64bit)
libtransitioner.so.1()(64bit)
libwrap.so.0()(64bit)
libxml2.so.2()(64bit)
libxslt.so.1()(64bit)
libz.so.1()(64bit)
rpmlib(PayloadIsLzma) = 4.4.6-1

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] libglue2 dependency missing in cluster-glue

2011-03-16 Thread Dominik Klein
Hi

as some of you might have seen on the pacemaker list, I tried to install
a 3 node cluster and there were ipc issues reported by the cib and
therefore the cluster could not start correctly.

The reason was that libglue2 and cluster-glue were not installed from
the clusterlabs repository, as the rest of the packages were, but
instead they were pulled from the original opensuse repository in an
older version.

So I went and updated cluster-glue with the version from the clusterlabs
repository. Nothing changed though.

rpm -qa|grep glue
revealed that libglue2 was still the old version while cluster-glue was
updated.

Looking at the package dependencies, I think the problem is that
cluster-glue does not depend on package libglue2 (while they do the
other way around).

So one error, which I could improve, was that the installation
instructions on the clusterlabs site did not mention libglue2 and
cluster-glue. They do now which should prevent others who follow those
instructions.

The dependency thing is up for grabs ;)

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] libglue2 dependency missing in cluster-glue

2011-03-16 Thread Dominik Klein
Hi Dejan

 The reason was that libglue2 and cluster-glue were not installed from
 the clusterlabs repository, as the rest of the packages were, but
 instead they were pulled from the original opensuse repository in an
 older version.
 
 This is what I found in pacemaker.spec.in in the repository:
 
 Requires(pre):  cluster-glue = 1.0.6
 
 Which version of glue was that older version?

0.9.1

So you're saying pacemaker depends on cluster-glue 1.0.6. Well, that was
not installed when I installed pacemaker. And I did not use --nodeps or
such thing.

Instead, that old version was installed from the original opensuse
repositories.

 So I went and updated cluster-glue with the version from the clusterlabs
 repository. Nothing changed though.

 rpm -qa|grep glue
 revealed that libglue2 was still the old version while cluster-glue was
 updated.

 Looking at the package dependencies, I think the problem is that
 cluster-glue does not depend on package libglue2 (while they do the
 other way around).
 
 Yes, I guess that that should be fixed.

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Feedback on conntrackd RA by Dominik Klein

2011-02-14 Thread Dominik Klein
Thanks for inclusion.

While looking through the pushed changes, I spotted two meta-data typos.
See trivial patch.

Regards
Dominik

 Applied and pushed with two minor edits. Thanks a lot!
 
 Cheers,
 Florian
--- conntrackd.orig	2011-02-14 11:43:22.0 +0100
+++ conntrackd	2011-02-14 11:43:42.0 +0100
@@ -57,7 +57,7 @@
 longdesc lang=enName of the conntrackd executable.
 If conntrackd is installed and available in the default PATH, it is sufficient to configure the name of the binary
 For example my-conntrackd-binary-version-0.9.14
-If conntrackd is installed somehwere else, you may also give a full path
+If conntrackd is installed somewhere else, you may also give a full path
 For example /packages/conntrackd-0.9.14/sbin/conntrackd
 /longdesc
 shortdesc lang=enName of the conntrackd executable/shortdesc
@@ -66,7 +66,7 @@
 
 parameter name=config
 longdesc lang=enFull path to the conntrackd.conf file.
-For example /packages/conntrackd-0.9.4/etc/conntrackd/conntrackd.conf/longdesc
+For example /packages/conntrackd-0.9.14/etc/conntrackd/conntrackd.conf/longdesc
 shortdesc lang=enPath to conntrackd.conf/shortdesc
 content type=string default=$OCF_RESKEY_config_default/
 /parameter
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Feedback on conntrackd RA by Dominik Klein

2011-02-11 Thread Dominik Klein
Hi Florian

 it appears that the RA is good to be merged with just a few changes left
 to be done.

Great!

 * Please fix the initialization to honor $OCF_FUNCTIONS_DIR and ditch
 the redundant locale initialization.

done

 * Please rename the parameters to follow the precendents set by other
 RAs (binary instead of conntrackd, config instead of
 conntrackdconf).

done

 * Please don't require people to set a full path to the conntrackd
 binary, honoring $PATH is expected.

I don't see where I do that. At least code-wise I never did that. Did
you mean the meta-data?

 * Please set defaults the way the other RAs do, rather than with your
 if [ -z OCF_RESKEY_whatever ] logic.

done

 * Please define the default path to your statefile in relative to
 ${HA_RSCTMP}. Also, put ${OCF_RESOURCE_INSTANCE} in the filename.

done

 * Actually, rather than managing your statefile manually, you might be
 able to just use ha_pseudo_resource().

done
nice function btw :)

 * Please revise your timeouts. Is a 240-second minimum timeout on start
 not a bit excessive?

Sure is. Copy and paste leftover. Changed to 30.

 * Please revise your metadata, specifically your longdescs. The more
 useful information you provide to users, the better. Recall that that
 information is readily available to users via the man pages and crm ra
 info.

done

Regards
Dominik
--- conntrackd	2011-02-10 12:23:37.054678924 +0100
+++ conntrackd.fghaas	2011-02-11 09:45:39.721300359 +0100
@@ -4,7 +4,7 @@
 #   An OCF RA for conntrackd
 #	http://conntrack-tools.netfilter.org/
 #
-# Copyright (c) 2010 Dominik Klein
+# Copyright (c) 2011 Dominik Klein
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of version 2 of the GNU General Public License as
@@ -25,11 +25,19 @@
 # along with this program; if not, write the Free Software Foundation,
 # Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
 #
+
 ###
 # Initialization:
 
-. ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
-export LANG=C LANGUAGE=C LC_ALL=C
+: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
+. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs
+
+###
+
+OCF_RESKEY_binary_default=/usr/sbin/conntrackd
+OCF_RESKEY_config_default=/etc/conntrackd/conntrackd.conf
+: ${OCF_RESKEY_binary=${OCF_RESKEY_binary_default}}
+: ${OCF_RESKEY_config=${OCF_RESKEY_config_default}}
 
 meta_data() {
 	cat END
@@ -46,30 +54,30 @@
 
 parameters
 parameter name=conntrackd
-longdesc lang=enFull path to conntrackd executable/longdesc
-shortdesc lang=enFull path to conntrackd executable/shortdesc
-content type=string default=/usr/sbin/conntrackd/
+longdesc lang=enName of the conntrackd executable.
+If conntrackd is installed and available in the default PATH, it is sufficient to configure the name of the binary
+For example my-conntrackd-binary-version-0.9.14
+If conntrackd is installed somehwere else, you may also give a full path
+For example /packages/conntrackd-0.9.14/sbin/conntrackd
+/longdesc
+shortdesc lang=enName of the conntrackd executable/shortdesc
+content type=string default=$OCF_RESKEY_binary_default/
 /parameter
 
-parameter name=conntrackdconf
-longdesc lang=enFull path to the conntrackd.conf file./longdesc
+parameter name=config
+longdesc lang=enFull path to the conntrackd.conf file.
+For example /packages/conntrackd-0.9.4/etc/conntrackd/conntrackd.conf/longdesc
 shortdesc lang=enPath to conntrackd.conf/shortdesc
-content type=string default=/etc/conntrackd/conntrackd.conf/
-/parameter
-
-parameter name=statefile
-longdesc lang=enFull path to the state file you wish to use./longdesc
-shortdesc lang=enFull path to the state file you wish to use./shortdesc
-content type=string default=/var/run/conntrackd.master/
+content type=string default=$OCF_RESKEY_config_default/
 /parameter
 /parameters
 
 actions
-action name=start   timeout=240 /
-action name=promote	 timeout=90 /
-action name=demote	timeout=90 /
-action name=notify	timeout=90 /
-action name=stoptimeout=100 /
+action name=start   timeout=30 /
+action name=promote	 timeout=30 /
+action name=demote	timeout=30 /
+action name=notify	timeout=30 /
+action name=stoptimeout=30 /
 action name=monitor depth=0  timeout=20 interval=20 role=Slave /
 action name=monitor depth=0  timeout=20 interval=10 role=Master /
 action name=meta-data  timeout=5 /
@@ -94,11 +102,7 @@
 conntrackd_is_master() {
 	# You can't query conntrackd whether it is master or slave. It can be both at the same time. 
 	# This RA creates a statefile during promote and enforces master-max=1 and clone-node-max=1
-	if [ -e $STATEFILE ]; then
-		return $OCF_SUCCESS
-	else
-		return $OCF_ERR_GENERIC
-	fi
+	ha_pseudo_resource $statefile monitor
 }
 
 conntrackd_set_master_score() {
@@ -108,11 +112,11 @@
 conntrackd_monitor() {
 	rc=$OCF_NOT_RUNNING
 	# It does not write a PID file, so check

Re: [Linux-ha-dev] Feedback on conntrackd RA by Dominik Klein

2011-02-11 Thread Dominik Klein
Maybe you applied the s/100/$slavescore patch someone sent a couple
weeks ago. I used the last version from thread New stateful RA:
conntrackd dated october 27th 3:29pm.

Anyway, here's my version.

Regards
Dominik

On 02/11/2011 01:36 PM, Florian Haas wrote:
 On 2011-02-11 09:48, Dominik Klein wrote:
 Hi Florian

 it appears that the RA is good to be merged with just a few changes left
 to be done.

 Great!

 [lots of exemplary role-model patch modifications]

 Regards
 Dominik
 
 Thanks! For some reason the patch does not apply in my checkout. Can you
 just send me your version? I'll figure it out then.
 
 Cheers,
 Florian
#!/bin/bash
#
#
#   An OCF RA for conntrackd
#   http://conntrack-tools.netfilter.org/
#
# Copyright (c) 2011 Dominik Klein
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like.  Any license provided herein, whether implied or
# otherwise, applies only to this software file.  Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#

###
# Initialization:

: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs

###

OCF_RESKEY_binary_default=/usr/sbin/conntrackd
OCF_RESKEY_config_default=/etc/conntrackd/conntrackd.conf
: ${OCF_RESKEY_binary=${OCF_RESKEY_binary_default}}
: ${OCF_RESKEY_config=${OCF_RESKEY_config_default}}

meta_data() {
cat END
?xml version=1.0?
!DOCTYPE resource-agent SYSTEM ra-api-1.dtd
resource-agent name=conntrackd
version1.1/version

longdesc lang=en
Master/Slave OCF Resource Agent for conntrackd
/longdesc

shortdesc lang=enThis resource agent manages conntrackd/shortdesc

parameters
parameter name=conntrackd
longdesc lang=enName of the conntrackd executable.
If conntrackd is installed and available in the default PATH, it is sufficient 
to configure the name of the binary
For example my-conntrackd-binary-version-0.9.14
If conntrackd is installed somehwere else, you may also give a full path
For example /packages/conntrackd-0.9.14/sbin/conntrackd
/longdesc
shortdesc lang=enName of the conntrackd executable/shortdesc
content type=string default=$OCF_RESKEY_binary_default/
/parameter

parameter name=config
longdesc lang=enFull path to the conntrackd.conf file.
For example 
/packages/conntrackd-0.9.4/etc/conntrackd/conntrackd.conf/longdesc
shortdesc lang=enPath to conntrackd.conf/shortdesc
content type=string default=$OCF_RESKEY_config_default/
/parameter
/parameters

actions
action name=start   timeout=30 /
action name=promote   timeout=30 /
action name=demote   timeout=30 /
action name=notify   timeout=30 /
action name=stoptimeout=30 /
action name=monitor depth=0  timeout=20 interval=20 role=Slave /
action name=monitor depth=0  timeout=20 interval=10 role=Master /
action name=meta-data  timeout=5 /
action name=validate-all  timeout=30 /
/actions
/resource-agent
END
}

meta_expect()
{
local what=$1 whatvar=OCF_RESKEY_CRM_meta_${1//-/_} op=$2 expect=$3
local val=${!whatvar}
if [[ -n $val ]]; then
# [, not [[, or it won't work ;)
[ $val $op $expect ]  return
fi
ocf_log err meta parameter misconfigured, expected $what $op $expect, 
but found ${val:-unset}.
exit $OCF_ERR_CONFIGURED
}

conntrackd_is_master() {
# You can't query conntrackd whether it is master or slave. It can be 
both at the same time. 
# This RA creates a statefile during promote and enforces master-max=1 
and clone-node-max=1
ha_pseudo_resource $statefile monitor
}

conntrackd_set_master_score() {
${HA_SBIN_DIR}/crm_master -Q -l reboot -v $1
}

conntrackd_monitor() {
rc=$OCF_NOT_RUNNING
# It does not write a PID file, so check with pgrep
pgrep -f $OCF_RESKEY_binary  rc=$OCF_SUCCESS
if [ $rc -eq $OCF_SUCCESS ]; then
# conntrackd is running 
# now see if it acceppts queries
if ! $OCF_RESKEY_binary -C $OCF_RESKEY_config -s  /dev/null 
21; then
rc=$OCF_ERR_GENERIC
ocf_log err conntrackd is running but not responding

Re: [Linux-ha-dev] Feedback on conntrackd RA by Dominik Klein

2011-02-08 Thread Dominik Klein
Not yet. That's why I wrote soon_-ish_ ;)

Any release coming up you want to include this in?

 any news on this?
 
 Cheers,
 Florian
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Report on conntrackd RA

2011-01-31 Thread Dominik Klein
Hi

thanks for testing and feedback.

On 01/27/2011 01:37 PM, Marjan, BlatnikČŠŽ wrote:
 Conntrackd RA from Dominik Klein works. We can now successfully 
 migrate/fail from one node to another one.
 
 At the begining, we have problems with failing. After reboot/fail, the 
 slave was not synced with master.  After some debuging I found, that the 
 conntrackd must not be started at boot time, but only by pacemaker. 

Like any other program managed by the cluster.

Regards
Dominik

 My 
 mistake. After disabling conntrackd boot script, failing works perfectly.
 
 If conntrackd on slave is started by init script, then master does not 
 issue
   conntrackd notify with
   OCF_RESKEY_CRM_meta_notify_type=post and
   OCF_RESKEY_CRM_meta_notify_operation=start
 and does not send a bulk update to the slave.
 Master does issue conntrackd notify with 
 OCF_RESKEY_CRM_meta_notify_type set  to pre, but since conntrackd on 
 slave is running, there is no post phase, which send a bulk update to 
 the slave.
 
 OCF_RESKEY_CRM_meta_notify_type may be ignored and send bulk update two 
 times, but it's better to control conntrackd only by pacemaker.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Feedback on conntrackd RA by Dominik Klein

2011-01-31 Thread Dominik Klein
Just now found this thread. I will include the suggested changes and
post the new RA soon-ish.

Dominik

On 01/21/2011 08:26 AM, Florian Haas wrote:
 On 01/18/2011 04:21 PM, Florian Haas wrote:
 Our site will shortly be deploying a new HA firewall based on Linux,
 iptables, pacemaker and conntrackd.
 conntrackd[1] is used to maintain connection state of active
 connections
 across the two firewalls allowing us to failover from one firewall to
 the other without dropping any connections.

 In order to achieve this with pacemaker we needed to find a resource
 agent for conntrackd. Looking at the mailing list we found a couple of
 options although we only fully evaluated the RA produced by Dominik
 Klein as it appears to be more feature complete than the alternative.
 For a full description of his RA please see his original thread[2].

 So far throughout testing we have been very pleased with it. We can
 successfully fail between our nodes and the RA correctly handles the
 synchronisation steps required in the background.
 
 Dominik,
 
 it appears that the RA is good to be merged with just a few changes left
 to be done.
 
 * Please fix the initialization to honor $OCF_FUNCTIONS_DIR and ditch
 the redundant locale initialization.
 
 * Please rename the parameters to follow the precendents set by other
 RAs (binary instead of conntrackd, config instead of
 conntrackdconf).
 
 * Please don't require people to set a full path to the conntrackd
 binary, honoring $PATH is expected.
 
 * Please set defaults the way the other RAs do, rather than with your
 if [ -z OCF_RESKEY_whatever ] logic.
 
 * Please define the default path to your statefile in relative to
 ${HA_RSCTMP}. Also, put ${OCF_RESOURCE_INSTANCE} in the filename.
 
 * Actually, rather than managing your statefile manually, you might be
 able to just use ha_pseudo_resource().
 
 * Please revise your timeouts. Is a 240-second minimum timeout on start
 not a bit excessive?
 
 * Please revise your metadata, specifically your longdescs. The more
 useful information you provide to users, the better. Recall that that
 information is readily available to users via the man pages and crm ra
 info.
 
 Thanks!
 Cheers,
 Florian
 
 


-- 
IN-telegence GmbH
Oskar-Jäger-Str. 125
50825 Köln

Registergericht AG Köln - HRB 34038
USt-ID DE210882245
Geschäftsführende Gesellschafter: Christian Plätke und Holger Jansen
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Feedback on conntrackd RA by Dominik Klein

2011-01-31 Thread Dominik Klein
 Or, put differently: is us tracking the supposed state really necessary,
 or can we inquire it from the service somehow?
 
 From the submitted RA:
 
 # You can't query conntrackd whether it is master or slave. It can 
 be both at the same time. 
 # This RA creates a statefile during promote and enforces 
 master-max=1 and clone-node-max=1
 
 Knowing Dominik I think it's safe to assume he's done his homework on
 this, and hasn't put in this comment without careful consideration.

If I knew a way to query the state, believe me, I would use it. I
totally understand this seems ugly the way it is and I agree 100%.

However, having a master/slave RA is what the cluster needs imho to
fully support conntrackd. Encouraging people to start conntrackd by init
and then have the RA just execute commands for state-shipping seemed and
seems odd to me (that's what the first RA did).

 But
 I'm sure he won't mind if you manage to convince him otherwise.

Sure I won't. Maybe a newer version (if exists) includes this. I'll have
another look.

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] New stateful RA: conntrackd

2010-10-27 Thread Dominik Klein
Hi everybody

So I updated my RA according to Florian's comments on Jonathan
Petersson's conntrackd RA. I also contacted him in order to merge our
RAs, no reply there yet. Once we talked, you will get an update by one
of us.

Regards
Dominik
#!/bin/bash
#
#
#   An OCF RA for conntrackd
#   http://conntrack-tools.netfilter.org/
#
# Copyright (c) 2010 Dominik Klein
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like.  Any license provided herein, whether implied or
# otherwise, applies only to this software file.  Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#
###
# Initialization:

. ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
export LANG=C LANGUAGE=C LC_ALL=C

meta_data() {
cat END
?xml version=1.0?
!DOCTYPE resource-agent SYSTEM ra-api-1.dtd
resource-agent name=conntrackd
version1.1/version

longdesc lang=en
Master/Slave OCF Resource Agent for conntrackd
/longdesc

shortdesc lang=enThis resource agent manages conntrackd/shortdesc

parameters
parameter name=conntrackd
longdesc lang=enFull path to conntrackd executable/longdesc
shortdesc lang=enFull path to conntrackd executable/shortdesc
content type=string default=/usr/sbin/conntrackd/
/parameter

parameter name=conntrackdconf
longdesc lang=enFull path to the conntrackd.conf file./longdesc
shortdesc lang=enPath to conntrackd.conf/shortdesc
content type=string default=/etc/conntrackd/conntrackd.conf/
/parameter

parameter name=statefile
longdesc lang=enFull path to the state file you wish to use./longdesc
shortdesc lang=enFull path to the state file you wish to use./shortdesc
content type=string default=/var/run/conntrackd.master/
/parameter
/parameters

actions
action name=start   timeout=240 /
action name=promote   timeout=90 /
action name=demote   timeout=90 /
action name=notify   timeout=90 /
action name=stoptimeout=100 /
action name=monitor depth=0  timeout=20 interval=20 role=Slave /
action name=monitor depth=0  timeout=20 interval=10 role=Master /
action name=meta-data  timeout=5 /
action name=validate-all  timeout=30 /
/actions
/resource-agent
END
}

meta_expect()
{
local what=$1 whatvar=OCF_RESKEY_CRM_meta_${1//-/_} op=$2 expect=$3
local val=${!whatvar}
if [[ -n $val ]]; then
# [, not [[, or it won't work ;)
[ $val $op $expect ]  return
fi
ocf_log err meta parameter misconfigured, expected $what $op $expect, 
but found ${val:-unset}.
exit $OCF_ERR_CONFIGURED
}

conntrackd_is_master() {
# You can't query conntrackd whether it is master or slave. It can be 
both at the same time. 
# This RA creates a statefile during promote and enforces master-max=1 
and clone-node-max=1
if [ -e $STATEFILE ]; then
return $OCF_SUCCESS
else
return $OCF_ERR_GENERIC
fi
}

conntrackd_set_master_score() {
${HA_SBIN_DIR}/crm_master -Q -l reboot -v $1
}

conntrackd_monitor() {
rc=$OCF_NOT_RUNNING
# It does not write a PID file, so check with pgrep
pgrep -f $CONNTRACKD  rc=$OCF_SUCCESS
if [ $rc = $OCF_SUCCESS ]; then
# conntrackd is running 
# now see if it acceppts queries
if ! ($CONNTRACKD -C $CONNTRACKD_CONF -s  /dev/null 21); then
rc=$OCF_ERR_GENERIC
ocf_log err conntrackd is running but not responding 
to queries
fi
if conntrackd_is_master; then
rc=$OCF_RUNNING_MASTER
# Restore master setting on probes
if [ $OCF_RESKEY_CRM_meta_interval -eq 0 ]; then
conntrackd_set_master_score $master_score
fi
else
# Restore master setting on probes
if [ $OCF_RESKEY_CRM_meta_interval -eq 0 ]; then
conntrackd_set_master_score $slave_score
fi
fi
fi
return $rc
}

conntrackd_start() {
rc=$OCF_ERR_GENERIC

# Keep

[Linux-ha-dev] New stateful RA: conntrackd

2010-10-15 Thread Dominik Klein
Hi everybody,

I wrote a master/slave RA to manage conntrackd, the connection tracking
daemon from the netfilter project. Conntrackd is used to replicate
connection state between highly available stateful firewalls.

Conntrackd replicates data using multicast. Basically it sends state
information about connections written to its kernels connection tracking
table. Replication slaves write these updates to an external cache.

When a firewall is to take over the master role, it commits the external
cache to the kernel and so knows the connections that were previously
running through the old master system and clients can continue working
without having to open a new connection.

While there has been an RA for conntrackd (at least I found something
that looked like it in a pastebin using google), this one was not able
to deal with failback, which is a thing I needed, and was not yet
included in the repository. I hope this one will be included.

The main challenge in this RA was the failback part. Say one system goes
down completely. Then it loses the kernel connection tracking table and
the external cache. Once it comes back, it will receive updates for new
connections that are initiated through the master, but it will neither
be sent the complete tracking table of the current master, nor can it
request this (that's how I understand and tested conntrackd works,
please correct me if I'm wrong :)).

This may be acceptable for short-lived connections and configurations
where there is no preferred master system, but it does become a problem
if you have either of those.

So my approach is to send a so called bulk update in two situations:

a) in the notify pre promote call, if the local machine is not the
machine to be promoted
This part is responsible for sending the update to a preferred master
that had previously failed (failback).
b) in the notify post start call, if the local machine is the master
This part is responsible for sending the update to a previously failed
machine that re-joins the cluster but is not to be promoted right away.

For now I limited the RA to deal with only 2 clones and 1 master since
this is the only testbed I have and I am not 100% sure what happens to
the new master in situation a) if there are multiple slaves.

Configuration could look like this, notify=true is important:

primitive conntrackd ocf:intelegence:conntrackd \
op monitor interval=10 timeout=10 \
op monitor interval=11 role=Master timeout=10
primitive ip-extern ocf:heartbeat:IPaddr2 \
params ip=10.2.50.237 cidr_netmask=24 \
op monitor interval=10 timeout=10
primitive ip-intern ocf:heartbeat:IPaddr2 \
params ip=10.2.52.3 cidr_netmask=24 \
op monitor interval=10 timeout=10
ms ms-conntrackd conntrackd \
meta target-role=Started globally-unique=false notify=true
colocation ip-intern-extern inf: ip-extern ip-intern
colocation ips-on-conntrackd-master inf: ip-intern ms-conntrackd:Master
order ips-after-conntrackd inf: ms-conntrackd:promote ip-intern:start

Please review and test the RA, post comments and questions. Maybe it can
be included in the repository.

Regards
Dominik

ps. yes, some parts are from linbit's drbd RA and some parts may also be
from Andrew's Stateful RA. Hope that's okay.

-- 
IN-telegence GmbH
Oskar-Jäger-Str. 125
50825 Köln

Registergericht AG Köln - HRB 34038
USt-ID DE210882245
Geschäftsführende Gesellschafter: Christian Plätke und Holger Jansen
#!/bin/bash
#
#
#   An OCF RA for conntrackd
#   http://conntrack-tools.netfilter.org/
#
# Copyright (c) 2010 Dominik Klein
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like.  Any license provided herein, whether implied or
# otherwise, applies only to this software file.  Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#
###
# Initialization:

. ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
export LANG=C LANGUAGE=C LC_ALL=C

meta_data() {
cat END
?xml version=1.0?
!DOCTYPE resource-agent SYSTEM ra-api-1.dtd
resource-agent name=conntrackd
version1.1/version

longdesc lang=en
Master/Slave OCF Resource Agent for conntrackd
/longdesc

shortdesc lang

Re: [Linux-ha-dev] ulimit in ocf scripts

2010-01-13 Thread Dominik Klein
Andrew Beekhof wrote:
 On Tue, Jan 12, 2010 at 10:43 AM, Raoul Bhatia [IPAX] r.bha...@ipax.at 
 wrote:
 On 01/12/2010 10:39 AM, Florian Haas wrote:
 Why not simply set that for root at boot? (it rhymes too :)
 because i do not like the idea that each and every process gets
 elevated limits by default.

 i think that there *should* be a generic way to configure ulimits an a
 per resource basis.
 I'm confident Dejan would be happy to accept a patch in which you add
 such a parameter to each resource agent where it makes sense.
 of course this would be possible. but i *think* it is more helpful to
 add this to e.g. the cib/lrmd/you name it.

 so before i/we implement the ulimit stuff *inside* lots of different
 RAs, i'd like to hear beekhof's or lars' comments.
 
 If you want a configurable per-resource limit - thats a resource parameter.
 Why would we want to implement another mechanism?

Of course this would be a resource parameter.

I think what he meant to say was that he does not want to have the
change inside every RA executing the ulimit command but to have some
cluster component (probably lrmd) do that.

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH]Support of stop escalation for mysql-RA.

2009-12-01 Thread Dominik Klein
I'd suggest an approach like Florian's from the Virtualdomain RA. Here's
a quote, guess you get the idea.

shutdown_timeout=$((($OCF_RESKEY_CRM_meta_timeout/1000)-5))

Regards
Dominik

Dejan Muhamedagic wrote:
 Hi Hideo-san,
 
 On Mon, Nov 30, 2009 at 11:00:05AM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi,

 We discovered a problem in an test of mysql.

 It is the problem that mysql cannot stop.
 This problem seems to occur at the time of diskfull and high CPU load.

 We included an escalation stop like pgsql.

 The problem is broken off by this revision, and a stop succeeds.

 Please commit this patch in a development version.
 
 Many thanks for the patch. I'm just not sure about the default
 escalate time. You set it to 30 seconds, perhaps it should be set
 to something longer. Otherwise some cluster configurations where
 the stop operation takes longer may have problems. I have no idea
 which value should we use, but I would tend to make it longer rather
 than shorter.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Monitor operation for the Filesystem RA

2009-09-16 Thread Dominik Klein
Dejan Muhamedagic wrote:
 Hi Florian,
 
 On Wed, Sep 16, 2009 at 08:25:30AM +0200, Florian Haas wrote:
 Lars, Dejan,

 as discussed on #linux-ha yesterday, I've pushed a small changeset to
 the Filesystem RA that implements a monitor operation which checks
 whether I/O on the mounted filesystem is in fact possible. Any
 suggestions for improvement would be most welcome.
 
 IMO, the monitor operation is now difficult to understand.  I
 don't mean the code, I didn't take a look at the code yet, but
 the usage. Also, as soon as you set the statusfile_prefix
 parameter, the 0 depth monitor changes behaviour. I don't find
 that good. The basic monitor operation should remain the same and
 just test if the filesystem is mounted as it always used to.

I agree.

 The new parameter should influence only the monitor operations of
 higher (deeper :) depth. So, I'd propose to have two depths, say
 10 and 20, of which the first would be just the read test and the
 second read-write.

Why not 1 and 2?

Then we'd have
0 = old behaviour
1 = read
2 = read/write

 Finally, the statusfile_prefix should be optional for deeper
 monitor operations and default to .${OCF_RESOURCE_INSTANCE}. If
 OCF_RESOURCE_INSTANCE doesn't contain the clone instance, then we
 should append the clone instance number (I suppose that it's
 available somewhere).

As fgh said, when you want to monitor a readonly fs, you'd have to know
the clone instance number for creating the file to read from. Not a good
idea imho. Or you'll have several files around which would be even more
ugly when you think about a larger cluster.

Why do we have to make the name configurable at all? Why not just give
it a generic name and only let the user configure OCF_CHECK_LEVEL for
each monitor? That said, I have not dealt with cluster filesystems yet.
Was the hostname-idea to avoid having multiple monitor instances trying
to write to one file and maybe run into locking/timeout issues?

Regards
Dominik

 I hope that this way the usage would be more straightforward. At
 least it looks so to me.
 
 Do I win the prize for the longest changeset description or what? ;)
 
 We need good documentation. I think it's great to write such
 descriptions :)
 
 Cheers,
 
 Dejan
 
 Cheers,
 Florian
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Patch: RA mysql

2009-04-24 Thread Dominik Klein
Trivial. See attached patch.

Regards
Dominik
exporting patch:
# HG changeset patch
# User Dominik Klein d...@in-telegence.net
# Date 1240578752 -7200
# Node ID 2d97904c385cc9b4779286001611bd748f48589d
# Parent  60cc2d6eee88ff6c2dedf7b539b9ee018efda6da
Low: RA mysql: Correctly remove eventually remaining socket

diff -r 60cc2d6eee88 -r 2d97904c385c resources/OCF/mysql
--- a/resources/OCF/mysql	Fri Apr 24 08:38:48 2009 +0200
+++ b/resources/OCF/mysql	Fri Apr 24 15:12:32 2009 +0200
@@ -419,7 +419,7 @@
 
 ocf_log info MySQL stopped;
 rm -f /var/lock/subsys/mysqld
-rm -f $OCF_RESKEY_datadir/mysql.sock
+rm -f $OCF_RESKEY_socket
 return $OCF_SUCCESS
 }
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Patch: RA anything

2009-02-11 Thread Dominik Klein
Hi

I fixed most of the things Lars mentioned in
http://hg.linux-ha.org/dev/rev/15bcf3491f9c and will explain why I did
not fix some of them. ocf-tester runs fine with the RA.

 # FIXME: This should use pidofproc

pidofproc is not available everywhere and is not able to get down to
command line options, eg could not tell the difference between $process
$option_a and $process $option_b which I wanted to support with this
agent.

Example:
dktest3:~/src/linuxha/hg/dev # sleep 200 
[1] 5799
dktest3:~/src/linuxha/hg/dev # sleep 300 
[2] 5801
dktest3:~/src/linuxha/hg/dev # pidofproc sleep
5801 5799
dktest3:~/src/linuxha/hg/dev # pidofproc sleep 300

 # FIXME: use start_daemon

start_daemon is not available everywhere either.

 # FIXME: What about daemons which can manage their own pidfiles?

This agent is meant to be used for programs that are not actually
daemons by design. It is meant to be able to run sth stupid in the
cluster. Even like /bin/sleep 1000

 # FIXME: use killproc

This is also a problem with $process $option_a and $process
$option_b. You can't just killproc $process then.

 # FIXME: Attributes special meaning to the resource id

I tried to, but couldn't understand what you meant here.

I also talked to Dejan on IRC and we agreed that anything is a bad
name for the RA and the changeset description was propably bad, too.
This RA is not for (as the cs stated) arbitrary daemons, it is more
for daemonizing programs which were not meant to be daemons.

If a proper name comes to anyone's mind - please share.

Hopefully, now it is a bit clearer what I wanted to be able to do with
this RA. I agree the cmd= lines and pid file creation are very very
ugly, but I could not yet find a better way. Not that much of a shell
genius I guess :( Please share if you can improve things.

Regards
Dominik
exporting patch:
# HG changeset patch
# User Dominik Klein d...@in-telegence.net
# Date 1234350091 -3600
# Node ID 04533b37813c8be009814f52de7b14ff65bf9862
# Parent  90ff997faa7288248ac57583b0c03df4c8e41bda
RA: anything. Implement most of lmbs suggestions.

diff -r 90ff997faa72 -r 04533b37813c resources/OCF/anything
--- a/resources/OCF/anything	Wed Feb 11 11:31:02 2009 +0100
+++ b/resources/OCF/anything	Wed Feb 11 12:01:31 2009 +0100
@@ -32,6 +32,7 @@
 #   OCF_RESKEY_errlogfile
 #   OCF_RESKEY_user
 #   OCF_RESKEY_monitor_hook
+#   OCF_RESKEY_stop_timeout
 #
 # This RA starts $binfile with $cmdline_options as $user and writes a $pidfile from that. 
 # If you want it to, it logs:
@@ -47,18 +48,20 @@
 # Initialization:
 . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
 
-getpid() { # make sure that the file contains a number
-	# FIXME: pidfiles could contain spaces
-	grep '^[0-9][0-9]*$' $1
+getpid() {
+grep -o '[0-9]*' $1
 }
 
 anything_status() {
-	# FIXME: This should use pidofproc
-	# FIXME: pidfile w/o process means the process died, so should
-	# be ERR_GENERIC
-	if test -f $pidfile  pid=`getpid $pidfile`  kill -0 $pid
+	if test -f $pidfile
 	then
-		return $OCF_RUNNING
+		if pid=`getpid $pidfile`  kill -0 $pid
+		then
+			return $OCF_RUNNING
+		else
+			# pidfile w/o process means the process died
+			return $OCF_ERR_GENERIC
+		fi
 	else
 		return $OCF_NOT_RUNNING
 	fi
@@ -66,8 +69,6 @@
 
 anything_start() {
 	if ! anything_status
-	# FIXME: use start_daemon
-	# FIXME: What about daemons which can manage their own pidfiles?
 	then
 		if [ -n $logfile -a -n $errlogfile ]
 		then
@@ -101,29 +102,48 @@
 }
 
 anything_stop() {
-	# FIXME: use killproc
+if [ -n $OCF_RESKEY_stop_timeout ]
+then
+stop_timeout=$OCF_RESKEY_stop_timeout
+elif [ -n $OCF_RESKEY_CRM_meta_timeout ]; then
+# Allow 2/3 of the action timeout for the orderly shutdown
+# (The origin unit is ms, hence the conversion)
+stop_timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
+else
+stop_timeout=10
+fi
 	if anything_status
 	then
-		pid=`getpid $pidfile`
-		kill $pid
-		i=0
-		# FIXME: escalate to kill -9 before timeout
-		while sleep 1 
-		do
-			if ! anything_status
-			then
-rm -f $pidfile  /dev/null 21
-return $OCF_SUCCESS
-			fi
-			let i++
-		done
+pid=`getpid $pidfile`
+kill $pid
+rm -f $pidfile
+i=0
+while [ $i -lt $stop_timeout ]
+do
+while sleep 1 
+do
+if ! anything_status
+then
+return $OCF_SUCCESS
+fi
+let i++
+done
+done
+ocf_log warn Stop with SIGTERM failed/timed out, now sending SIGKILL.
+kill -9 $pid
+if ! anything_status

Re: [Linux-ha-dev] patch: drbd OCF RA

2008-12-11 Thread Dominik Klein
Andrew Beekhof wrote:
 On Wed, Dec 10, 2008 at 16:51, Dejan Muhamedagic [EMAIL PROTECTED] wrote:
 diff -r 057a73385865 -r 1a5685e8f1ed resources/OCF/drbd
 --- a/resources/OCF/drbd  Tue Dec 02 20:29:32 2008 +0100
 +++ b/resources/OCF/drbd  Tue Dec 09 16:10:12 2008 +0100
 @@ -383,7 +383,11 @@ drbd_monitor() {
   ocf_log debug $RESOURCE monitor: resource not configured
   return $OCF_NOT_RUNNING
   elif [ $DRBD_STATE_LOCAL = Primary ]; then
 -#drbd_update_prefs
 + if [ -z $OCF_RESKEY_CRM_meta_interval ]; then
 shouldn't this be:

if [ -z $OCF_RESKEY_CRM_meta_interval -o 
 $OCF_RESKEY_CRM_meta_interval -eq 0 ]; then

 
 It's unset if the interval is 0.
 That may or may not be ok to rely on.
 
 Personally I favor adding:
 
 : ${OCF_RESKEY_CRM_meta_interval=0}
 
 and testing for 0

If you ever change your mind and not leave it empty but really put the 0
into that var during probes, both would still work :)

I don't care which you choose.

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] patch: drbd OCF RA

2008-12-09 Thread Dominik Klein
kleind detach and reattach with m/s resource is broken again if you
don't have constraints for the master role. upon start, the cluster
reports all resources running fine in the correct mode, but sees -1
promotion score for all drbd instances, which renders any master
colocated resource unrunnable. then stop colocated, demote drbd is
executed, which calls crm_master, now we have a promotion score and so
we have a new promotion and colocated start.
kleind on the one hand you say not to use constraints for the master
role because crm_master is to take care of that ...
beekhof its only broken if the RA is broken :)
beekhof the RA needs to call crm_master during probes
kleind probes would be monitor, right?
beekhof with interval = 0
kleind monitor_0
beekhof yep
kleind is there some $interval we can see in the RA or whats the
suggested way?
beekhof
http://hg.clusterlabs.org/pacemaker/stable-1.0/file/9ec6e48c1207/extra/resources/Stateful
kleind thx

This patch calls drbd_update_prefs if the master role is detected during
the probe. Within that function, crm_master is called and sets an
appropriate promotion score. That keeps the master instance running in
its current location.

Regards
Dominik
exporting patch:
# HG changeset patch
# User Dominik Klein [EMAIL PROTECTED]
# Date 1228835412 -3600
# Node ID 1a5685e8f1ed6c230ae3892856892e6a3a57d208
# Parent  057a733858655a68948f650ffb52f5eb9db9
Medium: RA: drbd - make sure crm_master scores are set during probes

diff -r 057a73385865 -r 1a5685e8f1ed resources/OCF/drbd
--- a/resources/OCF/drbd	Tue Dec 02 20:29:32 2008 +0100
+++ b/resources/OCF/drbd	Tue Dec 09 16:10:12 2008 +0100
@@ -383,7 +383,11 @@ drbd_monitor() {
 	ocf_log debug $RESOURCE monitor: resource not configured
 	return $OCF_NOT_RUNNING
 	elif [ $DRBD_STATE_LOCAL = Primary ]; then
-#	drbd_update_prefs
+		if [ -z $OCF_RESKEY_CRM_meta_interval ]; then
+			# Restore the master setting during probes 
+			ocf_log debug $RESOURCE monitor: restoring master setting during probe
+			drbd_update_prefs
+		fi
 	return $OCF_RUNNING_MASTER
 	elif [ $DRBD_STATE_LOCAL = Secondary ]; then
 #	drbd_update_prefs
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Changing failcount threshold for a single resource

2008-12-08 Thread Dominik Klein

Knight, Doug wrote:

All,
I am setting up a Filesystem resource to maintain an NFS mount on a
client system. I've configured a monitor function that checks every 15
minutes, timeout 1 minute. When an error occurs with the NFS mount (the
server is down for any length of time, etc), I'd like heartbeat to just
retry periodically to remount the NFS mount, rather than run monitor a
few times and error out. What is the best way to configure a resource
like this? I had considered increasing the failcount threshold (how to
do that?). Is there another way, maybe some way to tell heartbeat that
if the resource has failed, its OK, just keep trying, etc? Any
suggestions would be great. 


All this is for version = 1.0

The default value of migration-threshold is 0. Which means - don't 
care about the failure count of a resource, just keep re-starting it.


Then, setup a rsc_location constraint for your NFS on the node you want it.

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] patch: pingd OCF RA

2008-12-01 Thread Dominik Klein

Hi

I'd like to propose a patch for the pingd OCF RA.

Consider the following situation:

Cluster is set to maintenance-mode=true.
pingd is running as a clone.
The cluster (based on heartbeat) is shutdown and restarted.

As the RA checks the content of the pidfile, which was deleted by the 
heartbeat start, the cluster thinks pingd is not running. When the 
cluster is re-enabled, it starts another pingd.


I figured there was some point to remove $HA_RSCTMP during heartbeat 
start, so the patch changes the default path for the pid file.


Regards
Dominik
exporting patch:
# HG changeset patch
# User Dominik Klein [EMAIL PROTECTED]
# Date 1228138736 -3600
# Node ID e87ff08b20ca22647944a56fd772d2673c951457
# Parent  4a47778b2ca99a6b5985baf170bc3897fba2dfee
Medium: RA pingd

diff -r 4a47778b2ca9 -r e87ff08b20ca resources/OCF/pingd
--- a/resources/OCF/pingd	Wed Oct 29 13:24:34 2008 +0100
+++ b/resources/OCF/pingd	Mon Dec 01 14:38:56 2008 +0100
@@ -233,7 +233,7 @@ if [ $# -ne 1 ]; then
 exit $OCF_ERR_ARGS
 fi
 
-: ${OCF_RESKEY_pidfile:=$HA_RSCTMP/pingd-${OCF_RESOURCE_INSTANCE}}
+: ${OCF_RESKEY_pidfile:=/var/run/pingd-${OCF_RESOURCE_INSTANCE}.pid}
 : ${OCF_RESKEY_name:=pingd}
 : ${OCF_RESKEY_dampen:=1s}
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] hb_report: line 161: syntax error near unexpected token `('

2008-10-28 Thread Dominik Klein

Hi

I tried to use this mornings heartbeat dev tip and run hb_report. It 
reports


/usr/sbin/hb_report: line 161: syntax error near unexpected token `('
/usr/sbin/hb_report: line 161: `perl -e use POSIX; print 
strftime('%x %X',localtime($1));'


I can perfectly run that command on the commandline. Dejan?

Test environment is Linux-HA-Dev-1d4d513cbbf2 with 
Pacemaker-1-0-910fff67cf2f on opensuse 10.3 32bit, freshly set up 
yesterday including all package updates available.


Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Patch: apache return code

2008-07-03 Thread Dominik Klein

See user list. Thread [Linux-HA] Apache failover / renaming the binary

Regards
Dominik
exporting patch:
# HG changeset patch
# User Dominik Klein [EMAIL PROTECTED]
# Date 1215066469 -7200
# Node ID 94c262e9af4978ffe6be49f3bcb079750e3ec1a6
# Parent  412d1b01469463510ad3b61efb58f1dc4b77f257
Medium: RA: Fix apache returncode

diff -r 412d1b014694 -r 94c262e9af49 resources/OCF/apache
--- a/resources/OCF/apache	Thu Jun 26 09:56:58 2008 +0200
+++ b/resources/OCF/apache	Thu Jul 03 08:27:49 2008 +0200
@@ -515,7 +515,7 @@ validate_all_apache() {
   fi
   if [ ! -x $HTTPD ]; then
 	ocf_log err HTTPD $HTTPD not found or is not an executable!
-	exit $OCF_ERR_ARGS
+	exit $OCF_ERR_INSTALLED
   fi
   if [ ! -f $CONFIGFILE ]; then
 # We are sure to succeed here, since we have parsed $CONFIGFILE before getting here
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch: apache return code

2008-07-03 Thread Dominik Klein

Dominik Klein wrote:

See user list. Thread [Linux-HA] Apache failover / renaming the binary

Regards
Dominik


Again. Second issue though.
exporting patch:
# HG changeset patch
# User Dominik Klein [EMAIL PROTECTED]
# Date 1215073256 -7200
# Node ID db487301a953408ab59a2fc5aadc0b26169b6c2f
# Parent  94c262e9af4978ffe6be49f3bcb079750e3ec1a6
Medium: RA: return code if $HTTPD is not installed but tried to start

diff -r 94c262e9af49 -r db487301a953 resources/OCF/apache
--- a/resources/OCF/apache  Thu Jul 03 08:27:49 2008 +0200
+++ b/resources/OCF/apache  Thu Jul 03 10:20:56 2008 +0200
@@ -564,9 +564,10 @@ then
   [ -z $HTTPD ]
then
  case $COMMAND in
+   start)  exit$OCF_ERR_INSTALLED;;
stop)   exit$OCF_SUCCESS;;
monitor)exit$OCF_NOT_RUNNING;;
-status)exit  $LSB_STATUS_STOPPED;;
+status)exit$LSB_STATUS_STOPPED;;
meta-data)  metadata_apache;;
  esac
  ocf_log err No valid httpd found! Please revise your HTTPDLIST 
item
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Patch: mysql ocf ra return codes

2008-05-13 Thread Dominik Klein
The RA returned OCF_ERR_ARGS when datadir, config, user or group was not 
found. That is node specific and should not prevent the resource from 
running on any other node.


ERR_ARGS tells the cluster that the xml config is wrong and that 
prevents the resource from running on any node.


IRC snippet:
[13:11] kleind beekhof: could a failed start op on node2 prevent a 
resource from being started on node1?

[13:11] kleind node2 failcount=infinity
[13:11] kleind node1 no failcount
[13:44] beekhof yes
[13:44] beekhof depends why it failed
[15:53] beekhof kleind: ie. if the RA returns with invalid configuration
[15:54] kleind beekhof: it returned 2 (err args)
[15:54] beekhof thats the one
[15:54] kleind that prevents the res to run on any node?
[15:55] kleind because it thinks the resources cluster config is wrong
[15:55] beekhof yes
[15:55] kleind xml config
[15:55] kleind makes sense
[15:55] beekhof which is what the rc means :)
[15:55] kleind ok
[15:55] kleind then the RA is crap ;)
[15:55] beekhof yup :)
[15:55] kleind i'll see if I can fix the mysql RA on that
[15:56] kleind for testing, I moved my mysql installation to another 
directory on one node

[15:56] kleind that shoudlnt cause mysql not to run on any node
[15:57] beekhof correct
[15:58] kleind so, what should be returned when $configfile is not 
found on a node? err_installed I guess?

[15:58] beekhof sounds about right

Regards
Dominik
Exporting patch:
# HG changeset patch
# User Dominik Klein [EMAIL PROTECTED]
# Date 1210687383 -7200
# Node ID 42ce605e3da516db5e0a69b92d6e27433537ab53
# Parent  49b142475fa9925bb440d359816aacc1fe6c4495
Medium: Fix return codes if installation directory/user/group/config is not found.

diff -r 49b142475fa9925bb440d359816aacc1fe6c4495 -r 42ce605e3da516db5e0a69b92d6e27433537ab53 resources/OCF/mysql
--- a/resources/OCF/mysql	Wed May 07 16:14:04 2008 +0200
+++ b/resources/OCF/mysql	Tue May 13 16:03:03 2008 +0200
@@ -248,24 +248,24 @@ mysql_validate() {
 # checking the parameters
 if [ ! -f $OCF_RESKEY_config ]; then
 	ocf_log err Config $OCF_RESKEY_mysql_config doesn't exist;
-	exit $OCF_ERR_ARGS;
+	exit $OCF_ERR_INSTALLED;
 fi
 
 if [ ! -d $OCF_RESKEY_datadir ]; then
 	ocf_log err Datadir $OCF_RESKEY_datadir dosen't exist;
-	exit $OCF_ERR_ARGS;
+	exit $OCF_ERR_INSTALLED;
 fi
 
 getent passwd $OCF_RESKEY_user /dev/null 21
 if [ ! $? -eq 0 ]; then
 	ocf_log err User $OCF_RESKEY_user doesn't exit;
-	exit $OCF_ERR_ARGS;
+	exit $OCF_ERR_INSTALLED;
 fi
 
 getent group $OCF_RESKEY_group /dev/null 21
 if [ ! $? -eq 0 ]; then
 	ocf_log err Group $OCF_RESKEY_group doesn't exist;
-	exit $OCF_ERR_ARGS;
+	exit $OCF_ERR_INSTALLED;
 fi
 }
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] meta-data patch for mysql ocf ra

2008-05-07 Thread Dominik Klein

meta-data displayed a wrong default value for socket.

Regards
Dominik
Exporting patch:
# HG changeset patch
# User Dominik Klein [EMAIL PROTECTED]
# Date 1210167404 -7200
# Node ID 0fe9dfacad504390fc1c1c80e6e6cf28c440bd76
# Parent  ed0972c7aa43699ae2ec31f68cac3b9cbcc5d5c5
Mysql's meta-data had a wrong default value for socket

diff -r ed0972c7aa43699ae2ec31f68cac3b9cbcc5d5c5 -r 0fe9dfacad504390fc1c1c80e6e6cf28c440bd76 resources/OCF/mysql
--- a/resources/OCF/mysql	Sun Apr 27 22:33:14 2008 +0200
+++ b/resources/OCF/mysql	Wed May 07 15:36:44 2008 +0200
@@ -186,7 +186,7 @@ The socket to be used for mysqld.
 The socket to be used for mysqld.
 /longdesc
 shortdesc lang=enMySQL socket/shortdesc
-content type=string default=${OCF_RESKEY_pid_default}/
+content type=string default=${OCF_RESKEY_socket_default}/
 /parameter
 
 parameter name=test_table unique=0 required=0
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Re: meta-data patch for mysql ocf ra

2008-05-07 Thread Dominik Klein

There was more wrong stuff in the meta-data function. See patch.
Exporting patch:
# HG changeset patch
# User Dominik Klein [EMAIL PROTECTED]
# Date 1210167737 -7200
# Node ID 3e89d5a6edacc2a5d742fdb8eb62f8a56b16be1e
# Parent  0fe9dfacad504390fc1c1c80e6e6cf28c440bd76
More mysql meta-data fixing.

diff -r 0fe9dfacad504390fc1c1c80e6e6cf28c440bd76 -r 
3e89d5a6edacc2a5d742fdb8eb62f8a56b16be1e resources/OCF/mysql
--- a/resources/OCF/mysql   Wed May 07 15:36:44 2008 +0200
+++ b/resources/OCF/mysql   Wed May 07 15:42:17 2008 +0200
@@ -194,7 +194,7 @@ Table to be tested in monitor statement 
 Table to be tested in monitor statement (in database.table notation)
 /longdesc
 shortdesc lang=enMySQL test table/shortdesc
-content type=string default=OCF_RESKEY_test_table_default /
+content type=string default=${OCF_RESKEY_test_table_default} /
 /parameter
 
 parameter name=test_user unique=0 required=0
@@ -202,7 +202,7 @@ MySQL test user
 MySQL test user
 /longdesc
 shortdesc lang=enMySQL test user/shortdesc
-content type=string default=OCF_RESKEY_test_user_default /
+content type=string default=${OCF_RESKEY_test_user_default} /
 /parameter
 
 parameter name=test_passwd unique=0 required=0
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/