Re: [ClusterLabs] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-25 Thread Ondrej

Hi Ken,

On 2/26/20 7:30 AM, Ken Gaillot wrote:

The use case is a large organization with few cluster experts and many
junior system administrators who reboot hosts for OS updates during
planned maintenance windows, without any knowledge of what the host
does. The cluster runs services that have a preferred node and take a
very long time to start.

In this scenario, pacemaker's default behavior of moving the service to
a failover node when the node shuts down, and moving it back when the
node comes back up, results in needless downtime compared to just
leaving the service down for the few minutes needed for a reboot.


1. Do I understand it correctly that scenario will be when system 
gracefully reboots (pacemaker service is stopped by system shutting 
down) and also in case that users for example manually stop cluster but 
doesn't reboot the node - something like `pcs cluster stop`?



If you decide while the node is down that you need the resource to be
recovered, you can manually clear a lock with "crm_resource --refresh"
specifying both --node and --resource.


2. I'm interested how the situation will look like in the 'crm_mon' 
output or in 'crm_simulate'. Will there be some indication why the 
resources are not moving like 'blocked-shutdown-lock' or they will just 
appear as not moving (Stopped)?


Will this look differently from situation where for example the resource 
is just not allowed by constraint to run on other nodes?


Thanks for heads up

--
Ondrej Famera
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-25 Thread Strahil Nikolov
On February 26, 2020 12:30:24 AM GMT+02:00, Ken Gaillot  
wrote:
>Hi all,
>
>We are a couple of months away from starting the release cycle for
>Pacemaker 2.0.4. I'll highlight some new features between now and then.
>
>First we have shutdown locks. This is a narrow use case that I don't
>expect a lot of interest in, but it helps give pacemaker feature parity
>with proprietary HA systems, which can help users feel more comfortable
>switching to pacemaker and open source.
>
>The use case is a large organization with few cluster experts and many
>junior system administrators who reboot hosts for OS updates during
>planned maintenance windows, without any knowledge of what the host
>does. The cluster runs services that have a preferred node and take a
>very long time to start.
>
>In this scenario, pacemaker's default behavior of moving the service to
>a failover node when the node shuts down, and moving it back when the
>node comes back up, results in needless downtime compared to just
>leaving the service down for the few minutes needed for a reboot.
>
>The goal could be accomplished with existing pacemaker features.
>Maintenance mode wouldn't work because the node is being rebooted. But
>you could figure out what resources are active on the node, and use a
>location constraint with a rule to ban them on all other nodes before
>shutting down. That's a lot of work for something the cluster can
>figure out automatically.
>
>Pacemaker 2.0.4 will offer a new cluster property, shutdown-lock,
>defaulting to false to keep the current behavior. If shutdown-lock is
>set to true, any resources active on a node when it is cleanly shut
>down will be "locked" to the node (kept down rather than recovered
>elsewhere). Once the node comes back up and rejoins the cluster, they
>will be "unlocked" (free to move again if circumstances warrant).
>
>An additional cluster property, shutdown-lock-limit, allows you to set
>a timeout for the locks so that if the node doesn't come back within
>that time, the resources are free to be recovered elsewhere. This
>defaults to no limit.
>
>If you decide while the node is down that you need the resource to be
>recovered, you can manually clear a lock with "crm_resource --refresh"
>specifying both --node and --resource.
>
>There are some limitations using shutdown locks with Pacemaker Remote
>nodes, so I'd avoid that with the upcoming release, though it is
>possible.

Hi Ken,

Can it be 'shutdown-lock-timeout' instead of 'shutdown-lock-limit' ?
Also, I think that the default value could be something more reasonable - like 
30min. Usually 30min are OK if you don't patch the firmware and 180min are the 
maximum if you do patch the firmware.

The use case is odd. I have been in the same situation, and our solution was to 
train the team (internally) instead of using such feature.
The interesting part will be the behaviour of the local cluster stack, when 
updates  happen. The risk is high for the node to be fenced due to 
unresponsiveness (during the update) or if corosync/pacemaker  use an old 
function changed in the libs.

Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-25 Thread Ken Gaillot
Hi all,

We are a couple of months away from starting the release cycle for
Pacemaker 2.0.4. I'll highlight some new features between now and then.

First we have shutdown locks. This is a narrow use case that I don't
expect a lot of interest in, but it helps give pacemaker feature parity
with proprietary HA systems, which can help users feel more comfortable
switching to pacemaker and open source.

The use case is a large organization with few cluster experts and many
junior system administrators who reboot hosts for OS updates during
planned maintenance windows, without any knowledge of what the host
does. The cluster runs services that have a preferred node and take a
very long time to start.

In this scenario, pacemaker's default behavior of moving the service to
a failover node when the node shuts down, and moving it back when the
node comes back up, results in needless downtime compared to just
leaving the service down for the few minutes needed for a reboot.

The goal could be accomplished with existing pacemaker features.
Maintenance mode wouldn't work because the node is being rebooted. But
you could figure out what resources are active on the node, and use a
location constraint with a rule to ban them on all other nodes before
shutting down. That's a lot of work for something the cluster can
figure out automatically.

Pacemaker 2.0.4 will offer a new cluster property, shutdown-lock,
defaulting to false to keep the current behavior. If shutdown-lock is
set to true, any resources active on a node when it is cleanly shut
down will be "locked" to the node (kept down rather than recovered
elsewhere). Once the node comes back up and rejoins the cluster, they
will be "unlocked" (free to move again if circumstances warrant).

An additional cluster property, shutdown-lock-limit, allows you to set
a timeout for the locks so that if the node doesn't come back within
that time, the resources are free to be recovered elsewhere. This
defaults to no limit.

If you decide while the node is down that you need the resource to be
recovered, you can manually clear a lock with "crm_resource --refresh"
specifying both --node and --resource.

There are some limitations using shutdown locks with Pacemaker Remote
nodes, so I'd avoid that with the upcoming release, though it is
possible.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] fence-virt v1.0.0

2020-02-25 Thread Oyvind Albrigtsen

On 25/02/20 15:59 +0100, Oyvind Albrigtsen wrote:

ClusterLabs is happy to announce fence-agents v1.0.0.

Correction: fence-virt


The source code is available at:
https://github.com/ClusterLabs/fence-virt/releases/tag/v1.0.0

The most significant enhancements in this release are:
- bugfixes and enhancements:
- build: try to detect initscripts directory
- fence_virtd: accept SIGTERM while waiting for initialization
- fence_virtd: add manpages to service file

The full list of changes for fence-agents is available at:
https://github.com/ClusterLabs/fence-virt/compare/v0.9.0...v1.0.0

Everyone is encouraged to download and test the new release.
We do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

Many thanks to all the contributors to this release.


Best,
The fence-agents maintainers

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] fence-virt v1.0.0

2020-02-25 Thread Oyvind Albrigtsen

ClusterLabs is happy to announce fence-agents v1.0.0.

The source code is available at:
https://github.com/ClusterLabs/fence-virt/releases/tag/v1.0.0

The most significant enhancements in this release are:
- bugfixes and enhancements:
 - build: try to detect initscripts directory
 - fence_virtd: accept SIGTERM while waiting for initialization
 - fence_virtd: add manpages to service file

The full list of changes for fence-agents is available at:
https://github.com/ClusterLabs/fence-virt/compare/v0.9.0...v1.0.0

Everyone is encouraged to download and test the new release.
We do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

Many thanks to all the contributors to this release.


Best,
The fence-agents maintainers

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pacemaker all the resource state are on stopped

2020-02-25 Thread Amit Nakum
Dear users,

I have facing one strange issue on active active HA configuration. when any
resource failed ,pacemaker is moving to another server and pcs status
showing as per below.

Cluster_eno4   (ocf::heartbeat:IPaddr2):   Stopped testsrv1
 Cluster_eno1   (ocf::heartbeat:IPaddr2):   Stopped testsrv1
 Cluster_eno3   (ocf::heartbeat:IPaddr2):   Stopped testsrv1
 Clone Set: haproxy-clone [haproxy]
 haproxy(systemd:haproxy.service):  failed testsrv1
 haproxy(systemd:haproxy.service):  Stopped testsrv2
 Started: [ testsrv1 testsrv2 ]
 Clone Set: cluster_rtpengine-clone [cluster_rtpengine]
 cluster_rtpengine  (systemd:rtpengine.service):Stopped testsrv1
 Started: [ testsrv1 ]
 Clone Set: cluster_opensips-clone [cluster_opensips]
 cluster_opensips   (systemd:opensips.service): Stopped testsrv1
 cluster_opensips   (systemd:opensips.service): Stopped testsrv2
 Started: [ testsrv1 testsrv2 ]
 Clone Set: nginx-clone [nginx]
 nginx  (systemd:nginx.service):Stopped testsrv1
 nginx  (systemd:nginx.service):Stopped testsrv2
 Started: [ testsrv1 testsrv2 ]
 Clone Set: php-fpm-clone [php-fpm]
 php-fpm(systemd:php73-php-fpm.service):Stopped testsrv1
 php-fpm(systemd:php73-php-fpm.service):Stopped testsrv2
 Started: [ testsrv1 testsrv2 ]
 Clone Set: consumer-clone [consumer]
 consumer   (systemd:consumer.service): Stopped testsrv1
 consumer   (systemd:consumer.service): Stopped testsrv2
 Started: [ testsrv1 testsrv2 ]

Can anyone guide me in solving this problem

-- 
Thanks & Regards,
Amit Nakum | Sr. Support Engineer
+91 982482283 | Hangout & Skype: amit.na...@ecosmob.com
[image: Ecosmob Technologies Pvt. Ltd.] 

Ecosmob Technologies Pvt. Ltd.
https://www.ecosmob.com

VoIP | Web | Mobile | IoT | Big Data

ssdsds


sasadsdasdasdasdasdas  
   



-- 
*Disclaimer*
In addition to generic Disclaimer which you have agreed on our 
website, any views or opinions presented in this email are solely those of 
the originator and do not necessarily represent those of the Company or its 
sister concerns. Any liability (in negligence, contract or otherwise) 
arising from any third party taking any action, or refraining from taking 
any action on the basis of any of the information contained in this email 
is hereby excluded.



*Confidentiality*
This communication (including any 
attachment/s) is intended only for the use of the addressee(s) and contains 
information that is PRIVILEGED AND CONFIDENTIAL. Unauthorized reading, 
dissemination, distribution, or copying of this communication is 
prohibited. Please inform originator if you have received it in error.



*Caution for viruses, malware etc.*
This communication, including any 
attachments, may not be free of viruses, trojans, similar or new 
contaminants/malware, interceptions or interference, and may not be 
compatible with your systems. You shall carry out virus/malware scanning on 
your own before opening any attachment to this e-mail. The sender of this 
e-mail and Company including its sister concerns shall not be liable for 
any damage that may incur to you as a result of viruses, incompleteness of 
this message, a delay in receipt of this message or any other computer 
problems. 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/