On 24/06/14 09:36, Kostiantyn Ponomarenko wrote:
Hi Chrissie,
But wait_for_all doesn't help when there is no connection between the nodes.
Because in case I need to reboot the remaining working node I won't get
working cluster after that - both nodes will be waiting connection
between them.
That's why I am looking for the solution which could help me to get one
node working in this situation (after reboot).
I've been thinking about some kind of marker which could help a node to
determine a state of the other node.
Like external disk and SCSI reservation command. Maybe you could suggest
another kind of marker?
I am not sure can we use a presents of a file on external SSD as the
marker. Kind of: if there is a file - the other node is alive, if no -
node is dead.
More seriously, that solution is harder than it might seem - which is
one reason qdiskd was as complex as it became, and why votequorum is as
conservative as it is when it comes to declaring a workable cluster. If
someone is there to manually reboot nodes then it might be as well for a
human decision to be made about which one is capable of running services.
Chrissie
Digimer,
Thanks for the links and information.
Anyway if I go this way, I will write my own daemon to determine a state
of the other node.
Also the information about fence loop is new for me, thanks =)
Thank you,
Kostya
On Tue, Jun 24, 2014 at 10:55 AM, Christine Caulfield
<ccaul...@redhat.com <mailto:ccaul...@redhat.com>> wrote:
On 23/06/14 15:49, Digimer wrote:
Hi Kostya,
I'm having a little trouble understanding your question, sorry.
On boot, the node will not start anything, so after booting
it, you
log in, check that it can talk to the peer node (a simple ping is
generally enough), then start the cluster. It will join the peer's
existing cluster (even if it's a cluster on just itself).
If you booted both nodes, say after a power outage, you will
check
the connection (again, a simple ping is fine) and then start the
cluster
on both nodes at the same time.
wait_for_all helps with most of these situations. If a node goes
down then it won't start services until it's seen the non-failed
node because wait_for_all prevents a newly rebooted node from doing
anything on its own. This also takes care of the case where both
nodes are rebooted together of course, because that's the same as a
new start.
Chrissie
If one of the nodes needs to be shut down, say for repairs or
upgrades, you migrate the services off of it and over to the
peer node,
then you stop the cluster (which tells the peer that the node is
leaving
the cluster). After that, the remaining node operates by itself.
When
you turn it back on, you rejoin the cluster and migrate the
services back.
I think, maybe, you are looking at things more complicated
than you
need to. Pacemaker and corosync will handle most of this for
you, once
setup properly. What operating system do you plan to use, and what
cluster stack? I suspect it will be corosync + pacemaker, which
should
work fine.
digimer
On 23/06/14 10:36 AM, Kostiantyn Ponomarenko wrote:
Hi Digimer,
Suppose I disabled to cluster on start up, but what about
remaining
node, if I need to reboot it?
So, even in case of connection lost between these two nodes
I need to
have one node working and providing resources.
How did you solve this situation?
Should it be a separate daemon which checks somehow
connection between
the two nodes and decides to run corosync and pacemaker or
to keep them
down?
Thank you,
Kostya
On Mon, Jun 23, 2014 at 4:34 PM, Digimer <li...@alteeve.ca
<mailto:li...@alteeve.ca>
<mailto:li...@alteeve.ca <mailto:li...@alteeve.ca>>> wrote:
On 23/06/14 09:11 AM, Kostiantyn Ponomarenko wrote:
Hi guys,
I want to gather all possible configuration
variants for 2-node
cluster,
because it has a lot of pitfalls and there are not
a lot of
information
across the internet about it. And also I have some
questions
about
configurations and their specific problems.
VARIANT 1:
-----------------
We can use "two_node" and "wait_for_all" option
from Corosync's
votequorum, and set up fencing agents with delay on
one of them.
Here is a workflow(diagram) of this configuration:
1. Node start.
2. Cluster start (Corosync and Pacemaker) at the
boot time.
3. Wait for all nodes. All nodes joined?
No. Go to step 3.
Yes. Go to step 4.
4. Start resources.
5. Split brain situation (something with connection
between
nodes).
6. Fencing agent on the one of the nodes reboots
the other node
(there
is a configured delay on one of the Fencing agents).
7. Rebooted node go to step 1.
There are two (or more?) important things in this
configuration:
1. Rebooted node remains waiting for all nodes to
be visible
(connection
should be restored).
2. Suppose connection problem still exists and the
node which
rebooted
the other guy has to be rebooted also (for some
reasons). After
reboot
he is also stuck on step 3 because of connection
problem.
QUESTION:
-----------------
Is it possible somehow to assign to the guy who won
the reboot
race
(rebooted other guy) a status like a "primary" and
allow him not
to wait
for all nodes after reboot. And neglect this status
after
other node
joined this one.
So is it possible?
Right now that's the only configuration I know for
2 node
cluster.
Other variants are very appreciated =)
VARIANT 2 (not implemented, just a suggestion):
-----------------
I've been thinking about using external SSD drive
(or other
external
drive). So for example fencing agent can reserve
SSD using SCSI
command
and after that reboot the other node.
The main idea of this is the first node, as soon as
a cluster
starts on
it, reserves SSD till the other node joins the
cluster, after
that SCSI
reservation is removed.
1. Node start
2. Cluster start (Corosync and Pacemaker) at the
boot time.
3. Reserve SSD. Did it manage to reserve?
No. Don't start resources (Wait for all).
Yes. Go to step 4.
4. Start resources.
5. Remove SCSI reservation when the other node has
joined.
5. Split brain situation (something with connection
between
nodes).
6. Fencing agent tries to reserve SSD. Did it
manage to reserve?
No. Maybe puts node in standby mode ...
Yes. Reboot the other node.
7. Optional: a single node can keep SSD reservation
till he is
alone in
the cluster or till his shut-down.
I am really looking forward to find the best
solution (or a
couple of
them =)).
Hope I am not the only person ho is interested in
this topic.
Thank you,
Kostya
Hi Kostya,
I only build 2-node clusters, and I've not had
problems with this
going back to 2009 over dozens of clusters. The tricks
I found are:
* Disable quorum (of course)
* Setup good fencing, and add a delay to the node you
you prefer (or
pick one at random, if equal value) to avoid dual-fences
* Disable to cluster on start up, to prevent fence loops.
That's it. With this, your 2-node cluster will be
just fine.
As for your question; Once a node is fenced
successfully, the
resource manager (pacemaker) will take over any
services lost on the
fenced node, if that is how you configured it. A node
the either
gracefully leaves or dies/fenced should not interfere
with the
remaining node.
The problem is when a node vanishes and fencing
fails. Then, not
knowing what the other node might be doing, the only
safe option is
to block, otherwise you risk a split-brain. This is why
fencing is
so important.
Cheers
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a
person
without access to education?
___________________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
<mailto:Pacemaker@oss.clusterlabs.org>
<mailto:Pacemaker@oss.__clusterlabs.org
<mailto:Pacemaker@oss.clusterlabs.org>>
http://oss.clusterlabs.org/____mailman/listinfo/pacemaker
<http://oss.clusterlabs.org/__mailman/listinfo/pacemaker>
<http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
<http://oss.clusterlabs.org/mailman/listinfo/pacemaker>>
Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/____doc/Cluster_from_Scratch.pdf
<http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf>
<http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>>
Bugs: http://bugs.clusterlabs.org
_________________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
<mailto:Pacemaker@oss.clusterlabs.org>
http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
<http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
Bugs: http://bugs.clusterlabs.org
_________________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
<mailto:Pacemaker@oss.clusterlabs.org>
http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
<http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org