Re: [ClusterLabs] Notice: SLES11SP4 broke exportfs!

2015-12-21 Thread Dejan Muhamedagic
Hi,

On Sat, Dec 12, 2015 at 10:06:57AM +0300, Andrei Borzenkov wrote:
> 11.12.2015 21:27, Ulrich Windl пишет:
> > Hi!
> > 
> > After updating from SLES11SP3 (june version) to SLES11SP4 (todays version) 
> > exportfs fails to get the export status. I have message like this in syslog:
> > 
> > Dec 11 19:22:09 h04 crmd[11128]:   notice: process_lrm_event: 
> > rksaph04-prm_nfs_c11_mnt_exp_monitor_0:93 [ 
> > /usr/lib/ocf/resource.d/heartbeat/exportfs: line 178: 4f838db1: value too 
> > great for base (error token is "4f838db1")\n ]
> > 
> > Why is such broken code released? Here's the diff:
> > 
> > --- /usr/lib/ocf/resource.d/heartbeat/exportfs  2015-03-11 
> > 07:00:04.0 +0100
> ...
> 
> > @@ -165,18 +171,48 @@
> > !
> >  }
> > 
> > +reset_fsid() {
> > +   CURRENT_FSID=$OCF_RESKEY_fsid
> > +}
> > +bump_fsid() {
> > +   let $((CURRENT_FSID++))
> > +}
> 
> Here is where error comes from.
> 
> > +get_fsid() {
> > +   echo $CURRENT_FSID
> > +}
> > +
> > +# run a function on all directories
> > +forall() {
> > +   local func=$1
> > +   shift 1
> > +   local fast_exit=""
> > +   local dir rc=0
> > +   if [ "$2" = fast_exit ]; then
> > +   fast_exit=1
> > +   shift 1
> > +   fi
> > +   reset_fsid
> > +   for dir in $OCF_RESKEY_directory; do
> > +   $func $dir "$@"
> > +   rc=$(($rc | $?))
> > +   bump_fsid
> 
> called here
> 
> > +   [ "$fast_exit" ] && continue
> > +   [ $rc -ne 0 ] && return $rc
> > +   done
> > +   return $rc
> > +}
> > +
> ...
> 
> >  exportfs_validate_all ()
> >  {
> > -   if [ ! -d $OCF_RESKEY_directory ]; then
> > -   ocf_log err "$OCF_RESKEY_directory does not exist or is not 
> > a directory"
> > +   if [ `echo "$OCF_RESKEY_directory" | wc -w` -gt 1 ] &&
> > +   ! ocf_is_decimal "$OCF_RESKEY_fsid"; then
> > +   ocf_log err "use integer fsid when exporting multiple 
> > directories"
> > +   return $OCF_ERR_CONFIGURED
> > +   fi
> > +   if ! forall testdir; then
> > return $OCF_ERR_INSTALLED
> > fi
> >  }
> 
> It is validated to be decimal, but only if more than one directory is
> present, while it is always being incremented, even if only single
> directory is defined.

Good catch!

Thanks,

Dejan

> Same code present upstream (178 line number is a bit off).
> 
> Workaround is to change FSID, but yes, it looks like upstream bug.
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Re: duplicate node

2015-12-21 Thread Dejan Muhamedagic
Hi,

On Fri, Dec 11, 2015 at 09:34:59PM +, gerry kernan wrote:
> Hi 
> Tried removing the node with uuid ae4d76e7-af64-4d93-acdd-4d7b5c274eff but I 
> get an error that the node is not present in the cib.
> 
> When I do a crm configure show the uuid is there
> node $id="3b5d1061-8f68-4ab3-b169-e0ebe890c446" gat-voip-01.gdft.org node 
> $id="ae4d76e7-af64-4d93-acdd-4d7b5c274eff" gat-voip-01.gdft.org \
>   attributes standby="on"
> 
> is there any files I can edit manually to remove this or should I do complete 
> erase of the config and start fresh.

It's sort of involved to edit the cluster configuration (CIB)
offline. All nodes need to be shutdown, then remove all cib.*
files on all nodes but one, and on that node you can then edit
the CIB file like this (with crm for instance):

# CIB_file=/var/lib/*/cib/cib.xml crm configure
...

Then you have to remove the corresponding signature file
(cib.xml.sig), because after edit it's not going to match
anymore. Perhaps there's a way to generate it by hand, but I
don't know how. At any rate, pacemaker loads cib.xml in case
the .sig file is not present.

Thanks,

Dejan

> 
> 
> Gerry 
> -Original Message-
> From: Dejan Muhamedagic [mailto:deja...@fastmail.fm] 
> Sent: Thursday 10 December 2015 16:37
> To: Cluster Labs - All topics related to open-source clustering welcomed 
> 
> Subject:  Re: [ClusterLabs] duplicate node
> 
> Hi,
> 
> On Tue, Dec 08, 2015 at 09:17:27PM +, gerry kernan wrote:
> > Hi
> >  
> > How would I remove a duplicate node, I have a 2 node setup , but on node is 
> > showing twice .  crm show configure below, node gat-voip-01.gdft.org is 
> > listed twice.
> >  
> >  
> > node $id="0dc85a64-01ad-4fc5-81fd-698208a8322c" gat-voip-02\
> > attributes standby="on"
> > node $id="3b5d1061-8f68-4ab3-b169-e0ebe890c446" gat-voip-01 node 
> > $id="ae4d76e7-af64-4d93-acdd-4d7b5c274eff" gat-voip-01\
> > attributes standby="off"
> 
> First you need to figure out which one is the old uuid, then try:
> 
> # crm node delete 
> 
> This looks like heartbeat, there used to be a crm_uuid or something similar 
> to read the uuid. There's also a uuid file somewhere in /var/lib/heartbeat.
> 
> Thanks,
> 
> Dejan
> 
> > primitive res_Filesystem_rep ocf:heartbeat:Filesystem \
> > params device="/dev/drbd0" directory="/rep" fstype="ext3" \
> > operations $id="res_Filesystem_rep-operations" \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="60" \
> > op monitor interval="20" timeout="40" start-delay="0" \
> > op notify interval="0" timeout="60" \
> > meta target-role="started" is-managed="true"
> > primitive res_IPaddr2_northIP ocf:heartbeat:IPaddr2 \
> > params ip="10.75.29.10" cidr_netmask="26" \
> > operations $id="res_IPaddr2_northIP-operations" \
> > op start interval="0" timeout="20" \
> > op stop interval="0" timeout="20" \
> > op monitor interval="10" timeout="20" start-delay="0" \
> > meta target-role="started" is-managed="true"
> > primitive res_IPaddr2_sipIP ocf:heartbeat:IPaddr2 \
> > params ip="158.255.224.226" nic="bond2" \
> > operations $id="res_IPaddr2_sipIP-operations" \
> > op start interval="0" timeout="20" \
> > op stop interval="0" timeout="20" \
> > op monitor interval="10" timeout="20" start-delay="0" \
> > meta target-role="started" is-managed="true"
> > primitive res_asterisk_res_asterisk lsb:asterisk \
> > operations $id="res_asterisk_res_asterisk-operations" \
> > op start interval="0" timeout="15" \
> > op stop interval="0" timeout="15" \
> > op monitor interval="15" timeout="15" start-delay="15" \
> > meta target-role="started" is-managed="true"
> > primitive res_drbd_1 ocf:linbit:drbd \
> > params drbd_resource="r0" \
> > operations $id="res_drbd_1-operations" \
> > op start interval="0" timeout="240" \
> > op promote interval="0" timeout="90" \
> > op demote interval="0" timeout="90" \
> > op stop interval="0" timeout="100" \
> > op monitor interval="10" timeout="20" start-delay="0" \
> > op notify interval="0" timeout="90"
> > primitive res_httpd_res_httpd lsb:httpd \
> > operations $id="res_httpd_res_httpd-operations" \
> > op start interval="0" timeout="15" \
> > op stop interval="0" timeout="15" \
> > op monitor interval="15" timeout="15" start-delay="15" \
> > meta target-role="started" is-managed="true"
> > primitive res_mysqld_res_mysql lsb:mysqld \
> > operations $id="res_mysqld_res_mysql-operations" \
> > op start interval="0" timeout="15" \
> > op stop interval="0" timeout="15" \
> > op monitor interval="15" timeout="15" start-delay="15" \
> > meta target-role="started"
> > group asterisk res_Filesystem_rep 

Re: [ClusterLabs] Help required for N+1 redundancy setup

2015-12-21 Thread Nikhil Utane
I have prepared a write-up explaining my requirements and current solution
that I am proposing based on my understanding so far.
Kindly let me know if what I am proposing is good or there is a better way
to achieve the same.

https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing

Let me know if you face any issue in accessing the above link. Thanks.

On Thu, Dec 3, 2015 at 11:34 PM, Ken Gaillot  wrote:

> On 12/03/2015 05:23 AM, Nikhil Utane wrote:
> > Ken,
> >
> > One more question, if i have to propagate configuration changes between
> the
> > nodes then is cpg (closed process group) the right way?
> > For e.g.
> > Active Node1 has config A=1, B=2
> > Active Node2 has config A=3, B=4
> > Standby Node needs to have configuration for all the nodes such that
> > whichever goes down, it comes up with those values.
> > Here configuration is not static but can be updated at run-time.
>
> Being unfamiliar with the specifics of your case, I can't say what the
> best approach is, but it sounds like you will need to write a custom OCF
> resource agent to manage your service.
>
> A resource agent is similar to an init script:
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf
>
> The RA will start the service with the appropriate configuration. It can
> use per-resource options configured in pacemaker or external information
> to do that.
>
> How does your service get its configuration currently?
>
> > BTW, I'm little confused between OpenAIS and Corosync. For my purpose I
> > should be able to use either, right?
>
> Corosync started out as a subset of OpenAIS, optimized for use with
> Pacemaker. Corosync 2 is now the preferred membership layer for
> Pacemaker for most uses, though other layers are still supported.
>
> > Thanks.
> >
> > On Tue, Dec 1, 2015 at 9:04 PM, Ken Gaillot  wrote:
> >
> >> On 12/01/2015 05:31 AM, Nikhil Utane wrote:
> >>> Hi,
> >>>
> >>> I am evaluating whether it is feasible to use Pacemaker + Corosync to
> add
> >>> support for clustering/redundancy into our product.
> >>
> >> Most definitely
> >>
> >>> Our objectives:
> >>> 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby.
> >>
> >> You can do this with location constraints and scores. See:
> >>
> >>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on
> >>
> >> Basically, you give the standby node a lower score than the other nodes.
> >>
> >>> 2) Each node has some different configuration parameters.
> >>> 3) Whenever any active node goes down, the standby node comes up with
> the
> >>> same configuration that the active had.
> >>
> >> How you solve this requirement depends on the specifics of your
> >> situation. Ideally, you can use OCF resource agents that take the
> >> configuration location as a parameter. You may have to write your own,
> >> if none is available for your services.
> >>
> >>> 4) There is no one single process/service for which we need redundancy,
> >>> rather it is the entire system (multiple processes running together).
> >>
> >> This is trivially implemented using either groups or ordering and
> >> colocation constraints.
> >>
> >> Order constraint = start service A before starting service B (and stop
> >> in reverse order)
> >>
> >> Colocation constraint = keep services A and B on the same node
> >>
> >> Group = shortcut to specify several services that need to start/stop in
> >> order and be kept together
> >>
> >>
> >>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392
> >>
> >>
> >>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources
> >>
> >>
> >>> 5) I would also want to be notified when any active<->standby state
> >>> transition happens as I would want to take some steps at the
> application
> >>> level.
> >>
> >> There are multiple approaches.
> >>
> >> If you don't mind compiling your own packages, the latest master branch
> >> (which will be part of the upcoming 1.1.14 release) has built-in
> >> notification capability. See:
> >> http://blog.clusterlabs.org/blog/2015/reliable-notifications/
> >>
> >> Otherwise, you can use SNMP or e-mail if your packages were compiled
> >> with those options, or you can use the ocf:pacemaker:ClusterMon resource
> >> agent:
> >>
> >>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928
> >>
> >>> I went through the documents/blogs but all had example for 1 active
> and 1
> >>> standby use-case and that too for some standard service like httpd.
> >>
> >> Pacemaker is incredibly versatile, and the use cases are far too varied
> >> to cover more than a small subset. Those simple examples show the basic
> >> building blocks, and can usually point you to the specific features you
> 

Re: [ClusterLabs] Notice: SLES11SP4 broke exportfs!

2015-12-21 Thread Dejan Muhamedagic
Hi,

On Fri, Dec 11, 2015 at 07:27:28PM +0100, Ulrich Windl wrote:
> Hi!
> 
> After updating from SLES11SP3 (june version) to SLES11SP4 (todays version) 
> exportfs fails to get the export status. I have message like this in syslog:
> 
> Dec 11 19:22:09 h04 crmd[11128]:   notice: process_lrm_event: 
> rksaph04-prm_nfs_c11_mnt_exp_monitor_0:93 [ 
> /usr/lib/ocf/resource.d/heartbeat/exportfs: line 178: 4f838db1: value too 
> great for base (error token is "4f838db1")\n ]

The value of the fsid is unexpected. The code (and I) assumed
that it would be decimal and that's mentioned in the fsid
meta-data description.

> Why is such broken code released? Here's the diff:

I suspect that every newly released code is broken in some way
for some deployments.

Thanks,

Dejan

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: Notice: SLES11SP4 broke exportfs!

2015-12-21 Thread Ulrich Windl
>>> Dejan Muhamedagic  schrieb am 21.12.2015 um 11:40 in
Nachricht <20151221104011.GB9783@walrus.homenet>:
> Hi,
> 
> On Fri, Dec 11, 2015 at 07:27:28PM +0100, Ulrich Windl wrote:
>> Hi!
>> 
>> After updating from SLES11SP3 (june version) to SLES11SP4 (todays version) 
> exportfs fails to get the export status. I have message like this in syslog:
>> 
>> Dec 11 19:22:09 h04 crmd[11128]:   notice: process_lrm_event: 
> rksaph04-prm_nfs_c11_mnt_exp_monitor_0:93 [ 
> /usr/lib/ocf/resource.d/heartbeat/exportfs: line 178: 4f838db1: value too 
> great for base (error token is "4f838db1")\n ]
> 
> The value of the fsid is unexpected. The code (and I) assumed
> that it would be decimal and that's mentioned in the fsid
> meta-data description.

Hi!

Really? crm(live)# ra info exportfs:

[...]
fsid* (string): Unique fsid within cluster or starting fsid for multiple exports
.
The fsid option to pass to exportfs. This can be a unique positive
integer, a UUID, or the special string "root" which is functionally
identical to numeric fsid of 0.
If multiple directories are being exported, then they are
assigned ids sequentially starting with this fsid (fsid, fsid+1,
fsid+2, ...). Obviously, in that case the fsid must be an
integer.
0 (root) identifies the export as the root of an NFSv4
pseudofilesystem -- avoid this setting unless you understand its
special status.
This value will override any fsid provided via the options parameter.
[...]

Did you read "UUID" also? 

> 
>> Why is such broken code released? Here's the diff:
> 
> I suspect that every newly released code is broken in some way
> for some deployments.

The code clearly does not match the description, and it is broken.
I would also expect that "validate" would report values for fsid it cannot 
handle.
Furthermose I see no sense in trying to increment a fsid.

Maybe you can explain.

Regards,
Ulrich



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Notice: SLES11SP4 broke exportfs!

2015-12-21 Thread Dejan Muhamedagic
On Mon, Dec 21, 2015 at 12:54:49PM +0100, Ulrich Windl wrote:
> >>> Dejan Muhamedagic  schrieb am 21.12.2015 um 11:40 in
> Nachricht <20151221104011.GB9783@walrus.homenet>:
> > Hi,
> > 
> > On Fri, Dec 11, 2015 at 07:27:28PM +0100, Ulrich Windl wrote:
> >> Hi!
> >> 
> >> After updating from SLES11SP3 (june version) to SLES11SP4 (todays version) 
> > exportfs fails to get the export status. I have message like this in syslog:
> >> 
> >> Dec 11 19:22:09 h04 crmd[11128]:   notice: process_lrm_event: 
> > rksaph04-prm_nfs_c11_mnt_exp_monitor_0:93 [ 
> > /usr/lib/ocf/resource.d/heartbeat/exportfs: line 178: 4f838db1: value too 
> > great for base (error token is "4f838db1")\n ]
> > 
> > The value of the fsid is unexpected. The code (and I) assumed
> > that it would be decimal and that's mentioned in the fsid
> > meta-data description.
> 
> Hi!
> 
> Really? crm(live)# ra info exportfs:
> 
> [...]
> fsid* (string): Unique fsid within cluster or starting fsid for multiple 
> exports
> .
> The fsid option to pass to exportfs. This can be a unique positive
> integer, a UUID, or the special string "root" which is functionally
> identical to numeric fsid of 0.
> If multiple directories are being exported, then they are
> assigned ids sequentially starting with this fsid (fsid, fsid+1,
> fsid+2, ...). Obviously, in that case the fsid must be an
> integer.

  Here ^^^

> 0 (root) identifies the export as the root of an NFSv4
> pseudofilesystem -- avoid this setting unless you understand its
> special status.
> This value will override any fsid provided via the options parameter.
> [...]
> 
> Did you read "UUID" also? 
> 
> > 
> >> Why is such broken code released? Here's the diff:
> > 
> > I suspect that every newly released code is broken in some way
> > for some deployments.
> 
> The code clearly does not match the description, and it is broken.

The code _should_ match the description, but, as we all
concluded, there's a bug.

> I would also expect that "validate" would report values for fsid it cannot 
> handle.
> Furthermose I see no sense in trying to increment a fsid.
> 
> Maybe you can explain.

The RA tries to increase the fsid for a one-directory
configuration. Erroneously. It needs to be fixed _not_ to
manipulate the fsid for such configurations.

Thanks,

Dejan

> 
> Regards,
> Ulrich
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Q] Check on application layer (kamailio, openhab)

2015-12-21 Thread Dejan Muhamedagic
Hi,

On Mon, Dec 21, 2015 at 10:07:25AM -0600, Ken Gaillot wrote:
> On 12/19/2015 10:21 AM, Sebish wrote:
> > Dear all ha-list members,
> > 
> > I am trying to setup two availability checks on application layer using
> > heartbeat and pacemaker.
> > To be more concrete I need 1 resource agent (ra) for openHAB and 1 for
> > Kamailio SIP Proxy.
> > 
> > *My setup:
> > *
> > 
> >+ Debian 7.9 + Heartbeat + Pacemaker + more
> 
> This should work for your purposes, but FYI, corosync 2 is the preferred
> communications layer these days. Debian 7 provides corosync 1, which
> might be worth using here, to make an eventual switch to corosync 2 easier.
> 
> Also FYI, Pacemaker was dropped from Debian 8, but there is a group
> working on backporting the latest pacemaker/corosync/etc. to it.
> 
> >+ 2 Node Cluster with Hot-Standby Failover
> >+ Active Cluster with clusterip, ip-monitoring, working failover and
> >services
> >+ Copied kamailio ra into /usr/lib/ocf/resource.d/heartbeat, chmod
> >755 and 'crm ra list ocf heartbeat' finds it
> > 
> > *The plan:*
> > 
> > _openHAB_
> > 
> >My idea was to let heartbeat check for the availabilty of openHAB's
> >website (jettybased) or check if the the process is up and running.
> > 
> >I did not find a fitting resource agent. Is there a general ra in
> >which you would just have to insert the process name 'openhab'?
> > 
> > _Kamailio_
> > 
> >My idea was to let an ra send a SIP-request to kamailio and check,
> >if it gets an answer AND if it is the correct one.
> > 
> >It seems like the ra
> >   
> > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/kamailio
> > 
> >does exactly what I want,
> >but I do not really understand it. Is it plug and play? Do I have to
> >change values inside the code like users, the complete meta-data or
> >else?
> > 
> >When I try to insert this agent (no changes) into pacemaker using
> >'crm configure primitive kamailio ocf:heartbeat:kamailio' it says:
> > 
> >lrmadmin[4629]: 2015/12/19_16:11:40 ERROR:
> >lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a
> >reply message of rmetadata with function get_ret_from_msg.
> >ERROR: ocf:heartbeat:kamailio: could not parse meta-data:
> >ERROR: ocf:heartbeat:kamailio: could not parse meta-data:
> >ERROR: ocf:heartbeat:kamailio: no such resource agent
> 
> lrmadmin is no longer used, and I'm not familiar with it, but first I'd
> check that the RA is executable. If it supports running directly from
> the command line, maybe make sure you can run it that way first.

I think that the RA is just not installed.

Thanks,

Dejan

> Most RAs support configuration options, which you can set in the cluster
> configuration (you don't have to edit the RA). Each RA specifies the
> options it accepts in the  section of its metadata.
> 
> > *The question:*_
> > 
> > _Maybe you could give me some hints on what to do next. Perhaps one of
> > you is even already using the kamailio ra successfully or checking a
> > non-apache website?
> > If I simply have to insert all my cluster data into the kamailio ra, it
> > should not throw this error, should it? Could have used a readme for
> > this ra though...
> > If you need any data, I will provide it asap!
> > 
> > *
> > **Thanks a lot to all who read this mail!*
> > 
> > Sebish
> > ha-newbie, but not noobie ;)
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Q] Check on application layer (kamailio, openhab)

2015-12-21 Thread Ken Gaillot
On 12/19/2015 10:21 AM, Sebish wrote:
> Dear all ha-list members,
> 
> I am trying to setup two availability checks on application layer using
> heartbeat and pacemaker.
> To be more concrete I need 1 resource agent (ra) for openHAB and 1 for
> Kamailio SIP Proxy.
> 
> *My setup:
> *
> 
>+ Debian 7.9 + Heartbeat + Pacemaker + more

This should work for your purposes, but FYI, corosync 2 is the preferred
communications layer these days. Debian 7 provides corosync 1, which
might be worth using here, to make an eventual switch to corosync 2 easier.

Also FYI, Pacemaker was dropped from Debian 8, but there is a group
working on backporting the latest pacemaker/corosync/etc. to it.

>+ 2 Node Cluster with Hot-Standby Failover
>+ Active Cluster with clusterip, ip-monitoring, working failover and
>services
>+ Copied kamailio ra into /usr/lib/ocf/resource.d/heartbeat, chmod
>755 and 'crm ra list ocf heartbeat' finds it
> 
> *The plan:*
> 
> _openHAB_
> 
>My idea was to let heartbeat check for the availabilty of openHAB's
>website (jettybased) or check if the the process is up and running.
> 
>I did not find a fitting resource agent. Is there a general ra in
>which you would just have to insert the process name 'openhab'?
> 
> _Kamailio_
> 
>My idea was to let an ra send a SIP-request to kamailio and check,
>if it gets an answer AND if it is the correct one.
> 
>It seems like the ra
>   
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/kamailio
> 
>does exactly what I want,
>but I do not really understand it. Is it plug and play? Do I have to
>change values inside the code like users, the complete meta-data or
>else?
> 
>When I try to insert this agent (no changes) into pacemaker using
>'crm configure primitive kamailio ocf:heartbeat:kamailio' it says:
> 
>lrmadmin[4629]: 2015/12/19_16:11:40 ERROR:
>lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a
>reply message of rmetadata with function get_ret_from_msg.
>ERROR: ocf:heartbeat:kamailio: could not parse meta-data:
>ERROR: ocf:heartbeat:kamailio: could not parse meta-data:
>ERROR: ocf:heartbeat:kamailio: no such resource agent

lrmadmin is no longer used, and I'm not familiar with it, but first I'd
check that the RA is executable. If it supports running directly from
the command line, maybe make sure you can run it that way first.

Most RAs support configuration options, which you can set in the cluster
configuration (you don't have to edit the RA). Each RA specifies the
options it accepts in the  section of its metadata.

> *The question:*_
> 
> _Maybe you could give me some hints on what to do next. Perhaps one of
> you is even already using the kamailio ra successfully or checking a
> non-apache website?
> If I simply have to insert all my cluster data into the kamailio ra, it
> should not throw this error, should it? Could have used a readme for
> this ra though...
> If you need any data, I will provide it asap!
> 
> *
> **Thanks a lot to all who read this mail!*
> 
> Sebish
> ha-newbie, but not noobie ;)


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Anyone successfully install Pacemaker/Corosync on Freebsd?

2015-12-21 Thread Ken Gaillot
On 12/19/2015 04:56 PM, mike wrote:
> Hi All,
> 
> just curious if anyone has had any luck at one point installing
> Pacemaker and Corosync on FreeBSD. I have to install from source of
> course and I've run into an issue when running ./configure while trying
> to install Corosync. The process craps out at nss with this error:

FYI, Ruben Kerkhof has done some recent work to get the FreeBSD build
working. It will go into the next 1.1.14 release candidate. In the
meantime, make sure you have the very latest code from upstream's 1.1
branch.

> checking for nss... configure: error: in `/root/heartbeat/corosync-2.3.3':
> configure: error: The pkg-config script could not be found or is too
> old. Make sure it
> is in your PATH or set the PKG_CONFIG environment variable to the full
> path to pkg-config.​
> Alternatively, you may set the environment variables nss_CFLAGS
> and nss_LIBS to avoid the need to call pkg-config.
> See the pkg-config man page for more details.
> 
> I've looked unsuccessfully for a package called pkg-config and nss
> appears to be installed as you can see from this output:
> 
> root@wellesley:~/heartbeat/corosync-2.3.3 # pkg install nss
> Updating FreeBSD repository catalogue...
> FreeBSD repository is up-to-date.
> All repositories are up-to-date.
> Checking integrity... done (0 conflicting)
> The most recent version of packages are already installed
> 
> Anyway - just looking for any suggestions. Hoping that perhaps someone
> has successfully done this.
> 
> thanks in advance
> -mgb


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org