Re: [ClusterLabs] OCF Resource Agent for Galera

2015-12-01 Thread Jan Pokorný
On 29/11/15 21:32 +0530, Mayank Katyal wrote:
> I am facing error with the following OCF resource agent for Galera using it
> with Pacemaker and Heartbeat for setting up High Availability with MariaDB
> database server with multi Master on Debian Wheezy:
> 
> Galera Resource Agent
> 
> 
> The error I am getting in Heartbeat Logs is:
> 
> Waiting on node  to report database status before Master
> instances can start.
> 
> 
> Can anybody help? Thanks a lot.

Moving the question over to the users-centered cluster/HA list with
a broader audience.  It is a better fit for the question as the OCF
list, as I understand it, rather deals with OCF standard itself
(structure of agents' metadata, etc.) and is a low-to-no traffic
one[*].

[*] Not that there are no challenges for OCF ahead that could bring
better user experience at the configuration front-ends and what
not, but it doesn't seem to be worth the attention currently.

-- 
Jan (Poki)


pgpMa7soMJrkA.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] OCF Resource Agent for Galera

2015-12-01 Thread Damien Ciabrini
Sorry for the spam :)

- Original Message -
> 
> 
> - Original Message -
> > On 29/11/15 21:32 +0530, Mayank Katyal wrote:
> > > I am facing error with the following OCF resource agent for Galera using
> > > it
> > > with Pacemaker and Heartbeat for setting up High Availability with
> > > MariaDB
> > > database server with multi Master on Debian Wheezy:
> > > 
> > > Galera Resource Agent
> > > 
> > > 
> > > The error I am getting in Heartbeat Logs is:
> > > 
> > > Waiting on node  to report database status before Master
> > > instances can start.
> > > 
> > > 
> > > Can anybody help? Thanks a lot.
> > 

Mayank, the message you're seeing is logged at the very beginning of the
bootstrap of the cluster.

The resource agent has to determine what is the last transaction on all
the galera nodes before it can elect a bootstrap node (the one with the
most recent version of the database).

This is a fairly normal message that is logged until all nodes have 
been probed by the resource agent.

If you keep seeing this message, that probably means one of the node
cannot be probed by the resource agent. If so, crm_mon -A -1 may tell 
you which one. 


> > Moving the question over to the users-centered cluster/HA list with
> > a broader audience.  It is a better fit for the question as the OCF
> > list, as I understand it, rather deals with OCF standard itself
> > (structure of agents' metadata, etc.) and is a low-to-no traffic
> > one[*].
> > 
> > [*] Not that there are no challenges for OCF ahead that could bring
> > better user experience at the configuration front-ends and what
> > not, but it doesn't seem to be worth the attention currently.
> > 
> > --
> > Jan (Poki)
> > 
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> > 
> 

--
Damien

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Help required for N+1 redundancy setup

2015-12-01 Thread Ken Gaillot
On 12/01/2015 05:31 AM, Nikhil Utane wrote:
> Hi,
> 
> I am evaluating whether it is feasible to use Pacemaker + Corosync to add
> support for clustering/redundancy into our product.

Most definitely

> Our objectives:
> 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby.

You can do this with location constraints and scores. See:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on

Basically, you give the standby node a lower score than the other nodes.

> 2) Each node has some different configuration parameters.
> 3) Whenever any active node goes down, the standby node comes up with the
> same configuration that the active had.

How you solve this requirement depends on the specifics of your
situation. Ideally, you can use OCF resource agents that take the
configuration location as a parameter. You may have to write your own,
if none is available for your services.

> 4) There is no one single process/service for which we need redundancy,
> rather it is the entire system (multiple processes running together).

This is trivially implemented using either groups or ordering and
colocation constraints.

Order constraint = start service A before starting service B (and stop
in reverse order)

Colocation constraint = keep services A and B on the same node

Group = shortcut to specify several services that need to start/stop in
order and be kept together

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources


> 5) I would also want to be notified when any active<->standby state
> transition happens as I would want to take some steps at the application
> level.

There are multiple approaches.

If you don't mind compiling your own packages, the latest master branch
(which will be part of the upcoming 1.1.14 release) has built-in
notification capability. See:
http://blog.clusterlabs.org/blog/2015/reliable-notifications/

Otherwise, you can use SNMP or e-mail if your packages were compiled
with those options, or you can use the ocf:pacemaker:ClusterMon resource
agent:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928

> I went through the documents/blogs but all had example for 1 active and 1
> standby use-case and that too for some standard service like httpd.

Pacemaker is incredibly versatile, and the use cases are far too varied
to cover more than a small subset. Those simple examples show the basic
building blocks, and can usually point you to the specific features you
need to investigate further.

> One additional question, If I am having multiple actives, then Virtual IP
> configuration cannot be used? Is it possible such that N actives have
> different IP addresses but whenever standby becomes active it uses the IP
> address of the failed node?

Yes, there are a few approaches here, too.

The simplest is to assign a virtual IP to each active, and include it in
your group of resources. The whole group will fail over to the standby
node if the original goes down.

If you want a single virtual IP that is used by all your actives, one
alternative is to clone the ocf:heartbeat:IPaddr2 resource. When cloned,
that resource agent will use iptables' CLUSTERIP functionality, which
relies on multicast Ethernet addresses (not to be confused with
multicast IP). Since multicast Ethernet has limitations, this is not
often used in production.

A more complicated method is to use a virtual IP in combination with a
load-balancer such as haproxy. Pacemaker can manage haproxy and the real
services, and haproxy manages distributing requests to the real services.

> Thanking in advance.
> Nikhil

A last word of advice: Fencing (aka STONITH) is important for proper
recovery from difficult failure conditions. Without it, it is possible
to have data loss or corruption in a split-brain situation.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Help required for N+1 redundancy setup

2015-12-01 Thread Nikhil Utane
Hi,

I am evaluating whether it is feasible to use Pacemaker + Corosync to add
support for clustering/redundancy into our product.

Our objectives:
1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby.
2) Each node has some different configuration parameters.
3) Whenever any active node goes down, the standby node comes up with the
same configuration that the active had.
4) There is no one single process/service for which we need redundancy,
rather it is the entire system (multiple processes running together).
5) I would also want to be notified when any active<->standby state
transition happens as I would want to take some steps at the application
level.

I went through the documents/blogs but all had example for 1 active and 1
standby use-case and that too for some standard service like httpd.

One additional question, If I am having multiple actives, then Virtual IP
configuration cannot be used? Is it possible such that N actives have
different IP addresses but whenever standby becomes active it uses the IP
address of the failed node?

Thanking in advance.
Nikhil
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org