Re: [ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-23 Thread Tomas Jelinek

Dne 21. 04. 19 v 15:46 Andrei Borzenkov napsal(a):

21.04.2019 16:32, Lentes, Bernd пишет:

- Am 21. Apr 2019 um 6:51 schrieb Andrei Borzenkov arvidj...@gmail.com:


20.04.2019 22:29, Lentes, Bernd пишет:



- Am 18. Apr 2019 um 16:21 schrieb kgaillot kgail...@redhat.com:



Simply stopping pacemaker and corosync by whatever mechanism your
distribution uses (e.g. systemctl) should be sufficient.


That works. But strangely is that after a reboot both nodes are
shown as UNCLEAN. Does the cluster not remeber that it has been shutdown cleanly
?


No. Pacemaker does not care what state cluster was during last shutdown.
What matters is what state cluster is now.


Aah.
  

Problem is that after starting pacemaker and corosync on one node the other
is fenced because of that. (pacemaker and corosync aren't started automatically
by systemd).



That is correct and expected behavior. If node still did not appear
after timeout, pacemaker assumes node is faulted and attempts to proceed
with remaining nodes (after all, it is about _availability_ and waiting
indefinitely means resources won't be available). For this it needs to
ascertain state of missing node, so pacemaker attempts to stonith it.
Otherwise each node could attempt to start resources resulting in split
brain and data corruption.

Either start pacemaker on all nodes at the same time (with reasonable
fuzz, doing "systemctl start pacemaker" in several terminal windows
sequentially should be enough) or set wait_for_all option in corosync
configuration. Note that with if you have two node cluster, two_node
corosync option also implies wait_for_all.



Hi,

but what is if one node has e.g. a hardware failure and i have to wait for the 
spare part ?
With wait_for_all it can't start the resources.


Wait_for_all is only considered during initial startup. Once cluster is
up, node can fail and pacemaker will fail over resources as appropriate.
When node comes back it will join cluster.

If your question is - how do I start incomplete cluster - well, you can
temporary unset wait_for_all, or you can remove node from cluster and
add it back when it becomes available.


Or you can do simple "pcs quorum unblock", or "pcs cluster quorum 
unblock" in old pcs versions.




Or you can make sure you start pacemaker on all nodes simultaneously.
You do it manually anyway, so what prevents you from starting pacemaker
on all nodes close to each other? If you are using pcs, "pcs cluster
start --all" should do it for you.

Or you can live with extra stonith.

At the end it is up to you to decide what action plan is most
appropriate. What you cannot have is computer reading your mind and
knowing when it is safe to ignore missing node.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-21 Thread Andrei Borzenkov
21.04.2019 16:32, Lentes, Bernd пишет:
> - Am 21. Apr 2019 um 6:51 schrieb Andrei Borzenkov arvidj...@gmail.com:
> 
>> 20.04.2019 22:29, Lentes, Bernd пишет:
>>>
>>>
>>> - Am 18. Apr 2019 um 16:21 schrieb kgaillot kgail...@redhat.com:
>>>

 Simply stopping pacemaker and corosync by whatever mechanism your
 distribution uses (e.g. systemctl) should be sufficient.
>>>
>>> That works. But strangely is that after a reboot both nodes are
>>> shown as UNCLEAN. Does the cluster not remeber that it has been shutdown 
>>> cleanly
>>> ?
>>
>> No. Pacemaker does not care what state cluster was during last shutdown.
>> What matters is what state cluster is now.
> 
> Aah.
>  
>>> Problem is that after starting pacemaker and corosync on one node the other
>>> is fenced because of that. (pacemaker and corosync aren't started 
>>> automatically
>>> by systemd).
>>>
>>
>> That is correct and expected behavior. If node still did not appear
>> after timeout, pacemaker assumes node is faulted and attempts to proceed
>> with remaining nodes (after all, it is about _availability_ and waiting
>> indefinitely means resources won't be available). For this it needs to
>> ascertain state of missing node, so pacemaker attempts to stonith it.
>> Otherwise each node could attempt to start resources resulting in split
>> brain and data corruption.
>>
>> Either start pacemaker on all nodes at the same time (with reasonable
>> fuzz, doing "systemctl start pacemaker" in several terminal windows
>> sequentially should be enough) or set wait_for_all option in corosync
>> configuration. Note that with if you have two node cluster, two_node
>> corosync option also implies wait_for_all.
> 
> 
> Hi,
> 
> but what is if one node has e.g. a hardware failure and i have to wait for 
> the spare part ?
> With wait_for_all it can't start the resources.

Wait_for_all is only considered during initial startup. Once cluster is
up, node can fail and pacemaker will fail over resources as appropriate.
When node comes back it will join cluster.

If your question is - how do I start incomplete cluster - well, you can
temporary unset wait_for_all, or you can remove node from cluster and
add it back when it becomes available.

Or you can make sure you start pacemaker on all nodes simultaneously.
You do it manually anyway, so what prevents you from starting pacemaker
on all nodes close to each other? If you are using pcs, "pcs cluster
start --all" should do it for you.

Or you can live with extra stonith.

At the end it is up to you to decide what action plan is most
appropriate. What you cannot have is computer reading your mind and
knowing when it is safe to ignore missing node.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-21 Thread Lentes, Bernd
- Am 21. Apr 2019 um 6:51 schrieb Andrei Borzenkov arvidj...@gmail.com:

> 20.04.2019 22:29, Lentes, Bernd пишет:
>> 
>> 
>> - Am 18. Apr 2019 um 16:21 schrieb kgaillot kgail...@redhat.com:
>> 
>>>
>>> Simply stopping pacemaker and corosync by whatever mechanism your
>>> distribution uses (e.g. systemctl) should be sufficient.
>> 
>> That works. But strangely is that after a reboot both nodes are
>> shown as UNCLEAN. Does the cluster not remeber that it has been shutdown 
>> cleanly
>> ?
> 
> No. Pacemaker does not care what state cluster was during last shutdown.
> What matters is what state cluster is now.

Aah.
 
>> Problem is that after starting pacemaker and corosync on one node the other
>> is fenced because of that. (pacemaker and corosync aren't started 
>> automatically
>> by systemd).
>> 
> 
> That is correct and expected behavior. If node still did not appear
> after timeout, pacemaker assumes node is faulted and attempts to proceed
> with remaining nodes (after all, it is about _availability_ and waiting
> indefinitely means resources won't be available). For this it needs to
> ascertain state of missing node, so pacemaker attempts to stonith it.
> Otherwise each node could attempt to start resources resulting in split
> brain and data corruption.
> 
> Either start pacemaker on all nodes at the same time (with reasonable
> fuzz, doing "systemctl start pacemaker" in several terminal windows
> sequentially should be enough) or set wait_for_all option in corosync
> configuration. Note that with if you have two node cluster, two_node
> corosync option also implies wait_for_all.


Hi,

but what is if one node has e.g. a hardware failure and i have to wait for the 
spare part ?
With wait_for_all it can't start the resources.

Bernd
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, 
Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-20 Thread Andrei Borzenkov
20.04.2019 22:29, Lentes, Bernd пишет:
> 
> 
> - Am 18. Apr 2019 um 16:21 schrieb kgaillot kgail...@redhat.com:
> 
>>
>> Simply stopping pacemaker and corosync by whatever mechanism your
>> distribution uses (e.g. systemctl) should be sufficient.
> 
> That works. But strangely is that after a reboot both nodes are
> shown as UNCLEAN. Does the cluster not remeber that it has been shutdown 
> cleanly ?

No. Pacemaker does not care what state cluster was during last shutdown.
What matters is what state cluster is now.

> Problem is that after starting pacemaker and corosync on one node the other
> is fenced because of that. (pacemaker and corosync aren't started 
> automatically
> by systemd).
> 

That is correct and expected behavior. If node still did not appear
after timeout, pacemaker assumes node is faulted and attempts to proceed
with remaining nodes (after all, it is about _availability_ and waiting
indefinitely means resources won't be available). For this it needs to
ascertain state of missing node, so pacemaker attempts to stonith it.
Otherwise each node could attempt to start resources resulting in split
brain and data corruption.

Either start pacemaker on all nodes at the same time (with reasonable
fuzz, doing "systemctl start pacemaker" in several terminal windows
sequentially should be enough) or set wait_for_all option in corosync
configuration. Note that with if you have two node cluster, two_node
corosync option also implies wait_for_all.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-20 Thread Lentes, Bernd



- Am 18. Apr 2019 um 16:21 schrieb kgaillot kgail...@redhat.com:

> 
> Simply stopping pacemaker and corosync by whatever mechanism your
> distribution uses (e.g. systemctl) should be sufficient.

That works. But strangely is that after a reboot both nodes are
shown as UNCLEAN. Does the cluster not remeber that it has been shutdown 
cleanly ?
Problem is that after starting pacemaker and corosync on one node the other
is fenced because of that. (pacemaker and corosync aren't started automatically
by systemd).

Bernd
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, 
Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-18 Thread Ken Gaillot
On Thu, 2019-04-18 at 16:11 +0200, Lentes, Bernd wrote:
> Hi,
> 
> i have a two-node cluster, both servers are buffered by an UPS.
> If power is gone the UPS sends after a configurable time a signal via
> network to shutdown the servers.
> The UPS-Software (APC Power Chute Network Shutdown) gives me on the
> host the possibility to run scripts
> before it shuts down.
> 
> What would be the right procedure to shutdown the complete cluster
> cleanly ?
> 
> Many Thanks.
> 
> 
> Bernd

Simply stopping pacemaker and corosync by whatever mechanism your
distribution uses (e.g. systemctl) should be sufficient.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-18 Thread Lentes, Bernd
Hi,

i have a two-node cluster, both servers are buffered by an UPS.
If power is gone the UPS sends after a configurable time a signal via network 
to shutdown the servers.
The UPS-Software (APC Power Chute Network Shutdown) gives me on the host the 
possibility to run scripts
before it shuts down.

What would be the right procedure to shutdown the complete cluster cleanly ?

Many Thanks.


Bernd

-- 

Bernd Lentes 
Systemadministration 
Institut für Entwicklungsgenetik 
Gebäude 35.34 - Raum 208 
HelmholtzZentrum münchen 
bernd.len...@helmholtz-muenchen.de 
phone: +49 89 3187 1241 
phone: +49 89 3187 3827 
fax: +49 89 3187 2294 
http://www.helmholtz-muenchen.de/idg 

wer Fehler macht kann etwas lernen 
wer nichts macht kann auch nichts lernen
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, 
Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/