Re: [ClusterLabs] [EXTERNAL] Users Digest, Vol 55, Issue 19

2019-08-12 Thread Andrei Borzenkov
lso sind Tote perfekt
>  
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich 
> Bassler, Kerstin Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>  
>  
>  
> --
>  
> Message: 2
> Date: Mon, 12 Aug 2019 12:24:02 +0530
> From: Shital A 
> To: pgsql-gene...@postgresql.com, Users@clusterlabs.org
> Subject: [ClusterLabs] Postgres HA - pacemaker RA do not support auto
> failback
> Message-ID:
> 
> 
> Content-Type: text/plain; charset="utf-8"
>  
> Hello,
>  
> Postgres version : 9.6
> OS:Rhel 7.6
>  
> We are working on HA setup for postgres cluster of two nodes in
> active-passive mode.
>  
> Installed:
> Pacemaker 1.1.19
> Corosync 2.4.3
>  
> The pacemaker agent with this installation doesn't support automatic
> failback. What I mean by that is explained below:
> 1. Cluster is setup like A - B with A as master.
> 2. Kill services on A, node B will come up as master.
> 3. node A is ready to join the cluster, we have to delete the lock file it
> creates on any one of the node and execute the cleanup command to get the
> node back as standby
>  
> Step 3 is manual so HA is not achieved in real sense.
>  
> Please help to check:
> 1. Is there any version of the resouce agent which supports automatic
> failback? To avoid generation of lock file and deleting it.
>  
> 2. If there is no such support, if we need such functionality, do we have
> to modify existing code?
>  
> How this can be achieved. Please suggest.
> Thanks.
>  
> Thanks.
> -- next part --
> An HTML attachment was scrubbed...
> URL: 
> <https://lists.clusterlabs.org/pipermail/users/attachments/20190812/737a010e/attachment-0001.html>
>  
> --
>  
> Message: 3
> Date: Mon, 12 Aug 2019 17:47:02 +
> From: Chris Walker 
> To: Cluster Labs - All topics related to open-source clustering
> welcomed 
> Subject: Re: [ClusterLabs] why is node fenced ?
> Message-ID: 
> Content-Type: text/plain; charset="utf-8"
>  
> When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for 
> example,
>  
> Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info: 
> pcmk_quorum_notification: Quorum retained | membership=1320 members=1
>  
> after ~20s (dc-deadtime parameter), ha-idg-2 is marked 'unclean' and 
> STONITHed as part of startup fencing.
>  
> There is nothing in ha-idg-2's HA logs around 17:43 indicating that it saw 
> ha-idg-1 either, so it appears that there was no communication at all between 
> the two nodes.
>  
> I'm not sure exactly why the nodes did not see one another, but there are 
> indications of network issues around this time
>  
> 2019-08-09T17:42:16.427947+02:00 ha-idg-2 kernel: [ 1229.245533] bond1: now 
> running without any active interface!
>  
> so perhaps that's related.
>  
> HTH,
> Chris
>  
>  
> ?On 8/12/19, 12:09 PM, "Users on behalf of Lentes, Bernd" 
>  bernd.len...@helmholtz-muenchen.de> wrote:
>  
> Hi,
>
> last Friday (9th of August) i had to install patches on my two-node 
> cluster.
> I put one of the nodes (ha-idg-2) into standby (crm node standby 
> ha-idg-2), patched it, rebooted,
> started the cluster (systemctl start pacemaker) again, put the node again 
> online, everything fine.
>
> Then i wanted to do the same procedure with the other node (ha-idg-1).
> I put it in standby, patched it, rebooted, started pacemaker again.
> But then ha-idg-1 fenced ha-idg-2, it said the node is unclean.
> I know that nodes which are unclean need to be shutdown, that's logical.
>
> But i don't know from where the conclusion comes that the node is unclean 
> respectively why it is unclean,
> i searched in the logs and didn't find any hint.
>
> I put the syslog and the pacemaker log on a seafile share, i'd be very 
> thankful if you'll have a look.
> https://hmgubox.helmholtz-muenchen.de/d/53a10960932445fb9cfe/
>
> Here the cli history of the commands:
>
> 17:03:04  crm node standby ha-idg-2
> 17:07:15  zypper up (install Updates on ha-idg-2)
> 17:17:30  systemctl reboot
> 17:25:21  systemctl start pacemaker.service
> 17:25:47  crm node online ha-idg-2
> 17:26:35  crm node st

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-12 Thread Harvey Shepherd



Here the cli history of the commands:



17:03:04  crm node standby ha-idg-2

17:07:15  zypper up (install Updates on ha-idg-2)

17:17:30  systemctl reboot

17:25:21  systemctl start pacemaker.service

17:25:47  crm node online ha-idg-2

17:26:35  crm node standby ha-idg1-

17:30:21  zypper up (install Updates on ha-idg-1)

17:37:32  systemctl reboot

17:43:04  systemctl start pacemaker.service

17:44:00  ha-idg-1 is fenced



Thanks.



Bernd



OS is SLES 12 SP4, pacemaker 1.1.19, corosync 2.3.6-9.13.1





--



Bernd Lentes

Systemadministration

Institut f?r Entwicklungsgenetik

Geb?ude 35.34 - Raum 208

HelmholtzZentrum m?nchen

bernd.len...@helmholtz-muenchen.de<mailto:bernd.len...@helmholtz-muenchen.de>

phone: +49 89 3187 1241

phone: +49 89 3187 3827

fax: +49 89 3187 2294

http://www.helmholtz-muenchen.de/idg



Perfekt ist wer keine Fehler macht

Also sind Tote perfekt



Helmholtz Zentrum Muenchen

Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)

Ingolstaedter Landstr. 1

85764 Neuherberg

www.helmholtz-muenchen.de<http://www.helmholtz-muenchen.de>

Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling

Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, 
Kerstin Guenther

Registergericht: Amtsgericht Muenchen HRB 6466

USt-IdNr: DE 129521671







--



Message: 2

Date: Mon, 12 Aug 2019 12:24:02 +0530

From: Shital A mailto:brightuser2...@gmail.com>>

To: pgsql-gene...@postgresql.com<mailto:pgsql-gene...@postgresql.com>, 
Users@clusterlabs.org<mailto:Users@clusterlabs.org>

Subject: [ClusterLabs] Postgres HA - pacemaker RA do not support auto

failback

Message-ID:


mailto:camp7vw_kf2em_buh_fpbznc9z6pvvx+7rxjymhfmcozxuwg...@mail.gmail.com>>

Content-Type: text/plain; charset="utf-8"



Hello,



Postgres version : 9.6

OS:Rhel 7.6



We are working on HA setup for postgres cluster of two nodes in

active-passive mode.



Installed:

Pacemaker 1.1.19

Corosync 2.4.3



The pacemaker agent with this installation doesn't support automatic

failback. What I mean by that is explained below:

1. Cluster is setup like A - B with A as master.

2. Kill services on A, node B will come up as master.

3. node A is ready to join the cluster, we have to delete the lock file it

creates on any one of the node and execute the cleanup command to get the

node back as standby



Step 3 is manual so HA is not achieved in real sense.



Please help to check:

1. Is there any version of the resouce agent which supports automatic

failback? To avoid generation of lock file and deleting it.



2. If there is no such support, if we need such functionality, do we have

to modify existing code?



How this can be achieved. Please suggest.

Thanks.



Thanks.

-- next part --

An HTML attachment was scrubbed...

URL: 
<https://lists.clusterlabs.org/pipermail/users/attachments/20190812/737a010e/attachment-0001.html>



--



Message: 3

Date: Mon, 12 Aug 2019 17:47:02 +

From: Chris Walker mailto:cwal...@cray.com>>

To: Cluster Labs - All topics related to open-source clustering

welcomed mailto:users@clusterlabs.org>>

Subject: Re: [ClusterLabs] why is node fenced ?

Message-ID: 
mailto:eafef777-5a49-4c06-a2f6-8711f528b...@cray.com>>

Content-Type: text/plain; charset="utf-8"



When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for 
example,



Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info: pcmk_quorum_notification: 
Quorum retained | membership=1320 members=1



after ~20s (dc-deadtime parameter), ha-idg-2 is marked 'unclean' and STONITHed 
as part of startup fencing.



There is nothing in ha-idg-2's HA logs around 17:43 indicating that it saw 
ha-idg-1 either, so it appears that there was no communication at all between 
the two nodes.



I'm not sure exactly why the nodes did not see one another, but there are 
indications of network issues around this time



2019-08-09T17:42:16.427947+02:00 ha-idg-2 kernel: [ 1229.245533] bond1: now 
running without any active interface!



so perhaps that's related.



HTH,

Chris





?On 8/12/19, 12:09 PM, "Users on behalf of Lentes, Bernd" 
mailto:users-boun...@clusterlabs.org%20on%20behalf%20of%20bernd.len...@helmholtz-muenchen.de>>
 wrote:



Hi,



last Friday (9th of August) i had to install patches on my two-node cluster.

I put one of the nodes (ha-idg-2) into standby (crm node standby ha-idg-2), 
patched it, rebooted,

started the cluster (systemctl start pacemaker) again, put the node again 
online, everything fine.



Then i wanted to do the same procedure with the other node (ha-idg-1).

I put it in standby, patched it, rebooted, started pacemaker again.

But then ha-idg-1 fenced ha-idg-2, it said the node is unclean.

I kno

Re: [ClusterLabs] [EXTERNAL] Users Digest, Vol 55, Issue 21

2019-08-12 Thread Michael Powell
ng code?



How this can be achieved. Please suggest.

Thanks.



Thanks.

-- next part --

An HTML attachment was scrubbed...

URL: 
<https://lists.clusterlabs.org/pipermail/users/attachments/20190812/737a010e/attachment-0001.html>



--



Message: 3

Date: Mon, 12 Aug 2019 17:47:02 +

From: Chris Walker mailto:cwal...@cray.com>>

To: Cluster Labs - All topics related to open-source clustering

welcomed mailto:users@clusterlabs.org>>

Subject: Re: [ClusterLabs] why is node fenced ?

Message-ID: 
mailto:eafef777-5a49-4c06-a2f6-8711f528b...@cray.com>>

Content-Type: text/plain; charset="utf-8"



When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for 
example,



Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info: pcmk_quorum_notification: 
Quorum retained | membership=1320 members=1



after ~20s (dc-deadtime parameter), ha-idg-2 is marked 'unclean' and STONITHed 
as part of startup fencing.



There is nothing in ha-idg-2's HA logs around 17:43 indicating that it saw 
ha-idg-1 either, so it appears that there was no communication at all between 
the two nodes.



I'm not sure exactly why the nodes did not see one another, but there are 
indications of network issues around this time



2019-08-09T17:42:16.427947+02:00 ha-idg-2 kernel: [ 1229.245533] bond1: now 
running without any active interface!



so perhaps that's related.



HTH,

Chris





?On 8/12/19, 12:09 PM, "Users on behalf of Lentes, Bernd" 
mailto:users-boun...@clusterlabs.org%20on%20behalf%20of%20bernd.len...@helmholtz-muenchen.de>>
 wrote:



Hi,



last Friday (9th of August) i had to install patches on my two-node cluster.

I put one of the nodes (ha-idg-2) into standby (crm node standby ha-idg-2), 
patched it, rebooted,

started the cluster (systemctl start pacemaker) again, put the node again 
online, everything fine.



Then i wanted to do the same procedure with the other node (ha-idg-1).

I put it in standby, patched it, rebooted, started pacemaker again.

But then ha-idg-1 fenced ha-idg-2, it said the node is unclean.

I know that nodes which are unclean need to be shutdown, that's logical.



But i don't know from where the conclusion comes that the node is unclean 
respectively why it is unclean,

i searched in the logs and didn't find any hint.



I put the syslog and the pacemaker log on a seafile share, i'd be very 
thankful if you'll have a look.

https://hmgubox.helmholtz-muenchen.de/d/53a10960932445fb9cfe/



Here the cli history of the commands:



17:03:04  crm node standby ha-idg-2

17:07:15  zypper up (install Updates on ha-idg-2)

17:17:30  systemctl reboot

17:25:21  systemctl start pacemaker.service

17:25:47  crm node online ha-idg-2

17:26:35  crm node standby ha-idg1-

17:30:21  zypper up (install Updates on ha-idg-1)

17:37:32  systemctl reboot

17:43:04  systemctl start pacemaker.service

17:44:00  ha-idg-1 is fenced



Thanks.



Bernd



OS is SLES 12 SP4, pacemaker 1.1.19, corosync 2.3.6-9.13.1





--



Bernd Lentes

Systemadministration

Institut f?r Entwicklungsgenetik

Geb?ude 35.34 - Raum 208

HelmholtzZentrum m?nchen


bernd.len...@helmholtz-muenchen.de<mailto:bernd.len...@helmholtz-muenchen.de>

phone: +49 89 3187 1241

phone: +49 89 3187 3827

fax: +49 89 3187 2294

http://www.helmholtz-muenchen.de/idg



Perfekt ist wer keine Fehler macht

Also sind Tote perfekt





Helmholtz Zentrum Muenchen

Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)

Ingolstaedter Landstr. 1

85764 Neuherberg

www.helmholtz-muenchen.de<http://www.helmholtz-muenchen.de>

Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling

Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich 
Bassler, Kerstin Guenther

Registergericht: Amtsgericht Muenchen HRB 6466

USt-IdNr: DE 129521671



___

Manage your subscription:

https://lists.clusterlabs.org/mailman/listinfo/users



ClusterLabs home: https://www.clusterlabs.org/





--



Message: 4

Date: Mon, 12 Aug 2019 23:09:31 +0300

From: Andrei Borzenkov mailto:arvidj...@gmail.com>>

To: Cluster Labs - All topics related to open-source clustering

welcomed mailto:users@clusterlabs.org>>

Cc: Venkata Reddy Chappavarapu 
mailto:venkata.chappavar...@harmonicinc.com>>

Subject: Re: [ClusterLabs] Master/slave failover does not work as

expected

Message-ID:


mailto:CAA91j0WxSxt_eVmUvXgJ_0goBkBw69r3o-VesRvGc6atg6o=j...@mail.gmail.com>>

Content-Type: text/plain; charset="utf-8"



On Mon, Aug 12, 2019 at 4:12 PM Michael Powell <

michael.

Re: [ClusterLabs] Querying failed rersource operations from the CIB

2019-08-12 Thread Ken Gaillot
On Mon, 2019-08-12 at 11:15 +0200, Ulrich Windl wrote:
> Hi!
> 
> Back in December 2011 I had written a script to retrieve all failed
> resource operations by using "cibadmin -Q -o  lrm_resources" as data
> base. I was querying lrm_rsc_op for op-status != 0.
> In a newer release this does not seems to work anymore.
> 
> I see resource IDs ending with "_last_0", "_monitor_6", and
> "_last_failure_0", but even in the "_last_failure_0" the op-status is
> "0" (rc-code="7").
> Is this some bug, or is it a feature? That is: When will op-status be
> != 0?

rc-code is the result of the action itself (i.e. the resource agent),
whereas op-status is the result of pacemaker's attempt to execute the
agent.

If pacemaker was able to successfully initiate the resource agent and
get a reply back, then op-status will be 0, regardless of the rc-code
reported by the agent.

op-status will be nonzero when it couldn't get a result from the agent
-- the agent is not installed on the node, the agent timed out, the
connection to the local executor or Pacemaker Remote was lost, the
action was requested while the node was shutting down, etc.

There's also a special op-status (193) that indicates an action is
pending (i.e. it has been initiated and we're waiting for it to
complete). This is only seen when record-pending is true.

> crm_mon still reports a resource failure like this:
> Failed Resource Actions:
> * prm_nfs_server_monitor_6 on h11 'not running' (7): call=738,
> status=complete, exitreason='',
> last-rc-change='Mon Aug 12 04:52:23 2019', queued=0ms, exec=0ms
> 
> (it seems the nfs server monitor does this under load in SLES12 SP4,
> and I wonder where to look for the reason)
> BTW: "lrm_resources" is not documented, and the structure seemes to
> change. Can I restrict the output to LRM data?

One possibility is to run crm_mon with --as-xml and parse the failed
actions from that output. The schema is distributed as crm_mon.rng.

> Regards,
> Ulrich
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] why is node fenced ?

2019-08-12 Thread Ken Gaillot
On Mon, 2019-08-12 at 18:09 +0200, Lentes, Bernd wrote:
> Hi,
> 
> last Friday (9th of August) i had to install patches on my two-node
> cluster.
> I put one of the nodes (ha-idg-2) into standby (crm node standby ha-
> idg-2), patched it, rebooted, 
> started the cluster (systemctl start pacemaker) again, put the node
> again online, everything fine.
> 
> Then i wanted to do the same procedure with the other node (ha-idg-
> 1).
> I put it in standby, patched it, rebooted, started pacemaker again.
> But then ha-idg-1 fenced ha-idg-2, it said the node is unclean.
> I know that nodes which are unclean need to be shutdown, that's
> logical.
> 
> But i don't know from where the conclusion comes that the node is
> unclean respectively why it is unclean,
> i searched in the logs and didn't find any hint.

The key messages are:

Aug 09 17:43:27 [6326] ha-idg-1   crmd: info: crm_timer_popped: 
Election Trigger (I_DC_TIMEOUT) just popped (2ms)
Aug 09 17:43:27 [6326] ha-idg-1   crmd:  warning: do_log:   Input 
I_DC_TIMEOUT received in state S_PENDING from crm_timer_popped

That indicates the newly rebooted node didn't hear from the other node
within 20s, and so assumed it was dead.

The new node had quorum, but never saw the other node's corosync, so
I'm guessing you have two_node and/or wait_for_all disabled in
corosync.conf, and/or you have no-quorum-policy=ignore in pacemaker.

I'd recommend two_node: 1 in corosync.conf, with no explicit
wait_for_all or no-quorum-policy setting. That would ensure a
rebooted/restarted node doesn't get initial quorum until it has seen
the other node.

> I put the syslog and the pacemaker log on a seafile share, i'd be
> very thankful if you'll have a look.
> https://hmgubox.helmholtz-muenchen.de/d/53a10960932445fb9cfe/
> 
> Here the cli history of the commands:
> 
> 17:03:04  crm node standby ha-idg-2
> 17:07:15  zypper up (install Updates on ha-idg-2)
> 17:17:30  systemctl reboot
> 17:25:21  systemctl start pacemaker.service
> 17:25:47  crm node online ha-idg-2
> 17:26:35  crm node standby ha-idg1-
> 17:30:21  zypper up (install Updates on ha-idg-1)
> 17:37:32  systemctl reboot
> 17:43:04  systemctl start pacemaker.service
> 17:44:00  ha-idg-1 is fenced
> 
> Thanks.
> 
> Bernd
> 
> OS is SLES 12 SP4, pacemaker 1.1.19, corosync 2.3.6-9.13.1
> 
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Q: "crmd[7281]: warning: new_event_notification (7281-97955-15): Broken pipe (32)" as response to resource cleanup

2019-08-12 Thread Ken Gaillot
On Mon, 2019-08-12 at 17:46 +0200, Ulrich Windl wrote:
> Hi!
> 
> I just noticed that a "crm resource cleanup " caused some
> unexpected behavior and the syslog message:
> crmd[7281]:  warning: new_event_notification (7281-97955-15): Broken
> pipe (32)
> 
> It's SLES14 SP4 last updated Sept. 2018 (up since then, pacemaker-
> 1.1.19+20180928.0d2680780-1.8.x86_64).
> 
> The cleanup was due to a failed monitor. As an unexpected consequence
> of this cleanup, CRM seemed to restart the complete resource (and
> dependencies), even though it was running.

I assume the monitor failure was old, and recovery had already
completed? If not, recovery might have been initiated before the clean-
up was recorded.

> I noticed that a manual "crm_resource -C -r  -N " command
> has the same effect (multiple resources are "Cleaned up", resources
> are restarted seemingly before the "probe" is done.).

Can you verify whether the probes were done? The DC should log a
message when each _monitor_0 result comes in.

> Actually the manual says when cleaning up a single primitive, the
> whole group is cleaned up, unless using --force. Well ,I don't like
> this default, as I expect any status change from probe would
> propagate to the group anyway...

In 1.1, clean-up always wipes the history of the affected resources,
regardless of whether the history is for success or failure. That means
all the cleaned resources will be reprobed. In 2.0, clean-up by default
wipes the history only if there's a failed action (--refresh/-R is
required to get the 1.1 behavior). That lessens the impact of the
"default to whole group" behavior.

I think the original idea was that a group indicates that the resources
are closely related, so changing the status of one member might affect
what status the others report.

> Regards,
> Ulrich
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-12 Thread Ken Gaillot
On Mon, 2019-08-12 at 23:09 +0300, Andrei Borzenkov wrote:
> 
> 
> On Mon, Aug 12, 2019 at 4:12 PM Michael Powell <
> michael.pow...@harmonicinc.com> wrote:
> > At 07:44:49, the ss agent discovers that the master instance has
> > failed on node mgraid…-0 as a result of a failed ssadm request in
> > response to an ss_monitor() operation.  It issues a crm_master -Q
> > -D command with the intent of demoting the master and promoting the
> > slave, on the other node, to master.  The ss_demote() function
> > finds that the application is no longer running and returns
> > OCF_NOT_RUNNING (7).  In the older product, this was sufficient to
> > promote the other instance to master, but in the current product,
> > that does not happen.  Currently, the failed application is
> > restarted, as expected, and is promoted to master, but this takes
> > 10’s of seconds.
> >  
> > 
> 
> Did you try to disable resource stickiness for this ms?

Stickiness shouldn't affect where the master role is placed, just
whether the resource instances should stay on their current nodes
(independently of whether their role is staying the same or changing).

Are there any constraints that apply to the master role?

Another possibility is that you are mixing crm_master with and without
--lifetime=reboot (which controls whether the master attribute is
transient or permanent). Transient should really be the default but
isn't for historical reasons. It's a good idea to always use --
lifetime=reboot. You could double-check with "cibadmin -Q|grep master-" 
and see if there is more than one entry per node.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-12 Thread Harvey Shepherd
  crm node online ha-idg-2

17:26:35  crm node standby ha-idg1-

17:30:21  zypper up (install Updates on ha-idg-1)

17:37:32  systemctl reboot

17:43:04  systemctl start pacemaker.service

17:44:00  ha-idg-1 is fenced



Thanks.



Bernd



OS is SLES 12 SP4, pacemaker 1.1.19, corosync 2.3.6-9.13.1





--



Bernd Lentes

Systemadministration

Institut f?r Entwicklungsgenetik

Geb?ude 35.34 - Raum 208

HelmholtzZentrum m?nchen

bernd.len...@helmholtz-muenchen.de<mailto:bernd.len...@helmholtz-muenchen.de>

phone: +49 89 3187 1241

phone: +49 89 3187 3827

fax: +49 89 3187 2294

http://www.helmholtz-muenchen.de/idg



Perfekt ist wer keine Fehler macht

Also sind Tote perfekt



Helmholtz Zentrum Muenchen

Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)

Ingolstaedter Landstr. 1

85764 Neuherberg

www.helmholtz-muenchen.de<http://www.helmholtz-muenchen.de>

Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling

Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, 
Kerstin Guenther

Registergericht: Amtsgericht Muenchen HRB 6466

USt-IdNr: DE 129521671







--



Message: 2

Date: Mon, 12 Aug 2019 12:24:02 +0530

From: Shital A mailto:brightuser2...@gmail.com>>

To: pgsql-gene...@postgresql.com<mailto:pgsql-gene...@postgresql.com>, 
Users@clusterlabs.org<mailto:Users@clusterlabs.org>

Subject: [ClusterLabs] Postgres HA - pacemaker RA do not support auto

failback

Message-ID:


mailto:camp7vw_kf2em_buh_fpbznc9z6pvvx+7rxjymhfmcozxuwg...@mail.gmail.com>>

Content-Type: text/plain; charset="utf-8"



Hello,



Postgres version : 9.6

OS:Rhel 7.6



We are working on HA setup for postgres cluster of two nodes in

active-passive mode.



Installed:

Pacemaker 1.1.19

Corosync 2.4.3



The pacemaker agent with this installation doesn't support automatic

failback. What I mean by that is explained below:

1. Cluster is setup like A - B with A as master.

2. Kill services on A, node B will come up as master.

3. node A is ready to join the cluster, we have to delete the lock file it

creates on any one of the node and execute the cleanup command to get the

node back as standby



Step 3 is manual so HA is not achieved in real sense.



Please help to check:

1. Is there any version of the resouce agent which supports automatic

failback? To avoid generation of lock file and deleting it.



2. If there is no such support, if we need such functionality, do we have

to modify existing code?



How this can be achieved. Please suggest.

Thanks.



Thanks.

-- next part --

An HTML attachment was scrubbed...

URL: 
<https://lists.clusterlabs.org/pipermail/users/attachments/20190812/737a010e/attachment-0001.html>



--



Message: 3

Date: Mon, 12 Aug 2019 17:47:02 +

From: Chris Walker mailto:cwal...@cray.com>>

To: Cluster Labs - All topics related to open-source clustering

welcomed mailto:users@clusterlabs.org>>

Subject: Re: [ClusterLabs] why is node fenced ?

Message-ID: 
mailto:eafef777-5a49-4c06-a2f6-8711f528b...@cray.com>>

Content-Type: text/plain; charset="utf-8"



When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for 
example,



Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info: pcmk_quorum_notification: 
Quorum retained | membership=1320 members=1



after ~20s (dc-deadtime parameter), ha-idg-2 is marked 'unclean' and STONITHed 
as part of startup fencing.



There is nothing in ha-idg-2's HA logs around 17:43 indicating that it saw 
ha-idg-1 either, so it appears that there was no communication at all between 
the two nodes.



I'm not sure exactly why the nodes did not see one another, but there are 
indications of network issues around this time



2019-08-09T17:42:16.427947+02:00 ha-idg-2 kernel: [ 1229.245533] bond1: now 
running without any active interface!



so perhaps that's related.



HTH,

Chris





?On 8/12/19, 12:09 PM, "Users on behalf of Lentes, Bernd" 
mailto:users-boun...@clusterlabs.org%20on%20behalf%20of%20bernd.len...@helmholtz-muenchen.de>>
 wrote:



Hi,



last Friday (9th of August) i had to install patches on my two-node cluster.

I put one of the nodes (ha-idg-2) into standby (crm node standby ha-idg-2), 
patched it, rebooted,

started the cluster (systemctl start pacemaker) again, put the node again 
online, everything fine.



Then i wanted to do the same procedure with the other node (ha-idg-1).

I put it in standby, patched it, rebooted, started pacemaker again.

But then ha-idg-1 fenced ha-idg-2, it said the node is unclean.

I know that nodes which are unclean need to be shutdown, that's logical.



But i don't know from where the conclusion comes that the node is unclean 
respectively why it is unclean,

i searched i

Re: [ClusterLabs] [EXTERNAL] Users Digest, Vol 55, Issue 19

2019-08-12 Thread Michael Powell
dNr: DE 129521671







--



Message: 2

Date: Mon, 12 Aug 2019 12:24:02 +0530

From: Shital A mailto:brightuser2...@gmail.com>>

To: pgsql-gene...@postgresql.com<mailto:pgsql-gene...@postgresql.com>, 
Users@clusterlabs.org<mailto:Users@clusterlabs.org>

Subject: [ClusterLabs] Postgres HA - pacemaker RA do not support auto

failback

Message-ID:


mailto:camp7vw_kf2em_buh_fpbznc9z6pvvx+7rxjymhfmcozxuwg...@mail.gmail.com>>

Content-Type: text/plain; charset="utf-8"



Hello,



Postgres version : 9.6

OS:Rhel 7.6



We are working on HA setup for postgres cluster of two nodes in

active-passive mode.



Installed:

Pacemaker 1.1.19

Corosync 2.4.3



The pacemaker agent with this installation doesn't support automatic

failback. What I mean by that is explained below:

1. Cluster is setup like A - B with A as master.

2. Kill services on A, node B will come up as master.

3. node A is ready to join the cluster, we have to delete the lock file it

creates on any one of the node and execute the cleanup command to get the

node back as standby



Step 3 is manual so HA is not achieved in real sense.



Please help to check:

1. Is there any version of the resouce agent which supports automatic

failback? To avoid generation of lock file and deleting it.



2. If there is no such support, if we need such functionality, do we have

to modify existing code?



How this can be achieved. Please suggest.

Thanks.



Thanks.

-- next part ------

An HTML attachment was scrubbed...

URL: 
<https://lists.clusterlabs.org/pipermail/users/attachments/20190812/737a010e/attachment-0001.html>



--



Message: 3

Date: Mon, 12 Aug 2019 17:47:02 +

From: Chris Walker mailto:cwal...@cray.com>>

To: Cluster Labs - All topics related to open-source clustering

welcomed mailto:users@clusterlabs.org>>

Subject: Re: [ClusterLabs] why is node fenced ?

Message-ID: 
mailto:eafef777-5a49-4c06-a2f6-8711f528b...@cray.com>>

Content-Type: text/plain; charset="utf-8"



When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for 
example,



Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info: pcmk_quorum_notification: 
Quorum retained | membership=1320 members=1



after ~20s (dc-deadtime parameter), ha-idg-2 is marked 'unclean' and STONITHed 
as part of startup fencing.



There is nothing in ha-idg-2's HA logs around 17:43 indicating that it saw 
ha-idg-1 either, so it appears that there was no communication at all between 
the two nodes.



I'm not sure exactly why the nodes did not see one another, but there are 
indications of network issues around this time



2019-08-09T17:42:16.427947+02:00 ha-idg-2 kernel: [ 1229.245533] bond1: now 
running without any active interface!



so perhaps that's related.



HTH,

Chris





?On 8/12/19, 12:09 PM, "Users on behalf of Lentes, Bernd" 
mailto:users-boun...@clusterlabs.org%20on%20behalf%20of%20bernd.len...@helmholtz-muenchen.de>>
 wrote:



Hi,



last Friday (9th of August) i had to install patches on my two-node cluster.

I put one of the nodes (ha-idg-2) into standby (crm node standby ha-idg-2), 
patched it, rebooted,

started the cluster (systemctl start pacemaker) again, put the node again 
online, everything fine.



Then i wanted to do the same procedure with the other node (ha-idg-1).

I put it in standby, patched it, rebooted, started pacemaker again.

But then ha-idg-1 fenced ha-idg-2, it said the node is unclean.

I know that nodes which are unclean need to be shutdown, that's logical.



But i don't know from where the conclusion comes that the node is unclean 
respectively why it is unclean,

i searched in the logs and didn't find any hint.



I put the syslog and the pacemaker log on a seafile share, i'd be very 
thankful if you'll have a look.

https://hmgubox.helmholtz-muenchen.de/d/53a10960932445fb9cfe/



Here the cli history of the commands:



17:03:04  crm node standby ha-idg-2

17:07:15  zypper up (install Updates on ha-idg-2)

17:17:30  systemctl reboot

17:25:21  systemctl start pacemaker.service

17:25:47  crm node online ha-idg-2

17:26:35  crm node standby ha-idg1-

17:30:21  zypper up (install Updates on ha-idg-1)

17:37:32  systemctl reboot

17:43:04  systemctl start pacemaker.service

17:44:00  ha-idg-1 is fenced



Thanks.



Bernd



OS is SLES 12 SP4, pacemaker 1.1.19, corosync 2.3.6-9.13.1





--



Bernd Lentes

Systemadministration

Institut f?r Entwicklungsgenetik

Geb?ude 35.34 - Raum 208

HelmholtzZentrum m?nchen


bernd.len...@helmholtz-muenchen.de<mailto:bernd.len...@helmholtz-muenchen.de>

phone: +49 89 3187 1241

phone: +49 89 3187 3827

fax: +49 89 3187 2294

http:/

Re: [ClusterLabs] Restoring network connection breaks cluster services

2019-08-12 Thread Jan Pokorný
On 07/08/19 16:06 +0200, Momcilo Medic wrote:
> On Wed, Aug 7, 2019 at 1:00 PM Klaus Wenninger  wrote:
> 
>> On 8/7/19 12:26 PM, Momcilo Medic wrote:
>> 
>>> We have three node cluster that is setup to stop resources on lost
>>> quorum.  Failure (network going down) handling is done properly,
>>> but recovery doesn't seem to work.
>> 
>> What do you mean by 'network going down'?
>> Loss of link? Does the IP persist on the interface
>> in that case?
> 
> Yes, we simulate faulty cable by turning switch ports down and up.
> In such a case, the IP does not persist on the interface.
> 
>> That there are issue reconnecting the CPG-API sounds strange to me.
>> Already the fact that something has to be reconnected. I got it
>> that your nodes were persistently up during the
>> network-disconnection. Although I would have expected fencing to
>> kick in at least on those which are part of the non-quorate
>> cluster-partition.  Maybe a few words more on your scenario
>> (fening-setup e.g.) would help to understand what is going on.
> 
> We don't use any fencing mechanisms, we rely on quorum to run the
> services.  In more detail, we run three node Linbit LINSTOR storage
> that is hyperconverged.  Meaning, we run clustered storage on the
> virtualization hypervisors.
> 
> We use pcs in order to have linstor-controller service in high
> availabilty mode.  Policy for no quorum is to stop the resources.
> 
> In such hyperconverged setup, we can't fence a node without impact.
> It may happen that network instability causes primary node to no
> longer be primary.  In that case, we don't want running VMs to go
> down with the ship, as there was no impact for them.
> 
> However, we would like to have high-availability of that service
> upon network restoration, without manual actions.

This spurred a train of thought that is admittedly not immediately
helpful in this case:

* * *

1. the word "converged" is a fitting word for how we'd like
   the cluster stack to appear (from the outside), but what we have
   is that some circumstances are not clearly articulated across the
   components meaning that there's no way for users to express the
   preferences in simple terms and in a non-conflicting and
   unambiguous ways when 2+ components' realms combine together
   -- high level tools like pcs may attempt to rectify that to some
   extent, but they fall short when there are no surfaces to glue (at
   least unambiguously, see also parallel thread about shutting the
   cluster down in the presence of sbd)

   it seems to me that the very circumstance that was hit here is
   exactly where corosync authors decided that it's rare and obnoxious
   to indicate up the chain for a detached destiny reasoning (which
   pacemaker normaly performs) enough that they rather stop right
   there (and in a well-behaved cluster configuration hence ask to be
   fenced)

   all is actually sound, until one starts to make compromises like
   here was done, with ditching of the fencing (think: sanity
   assurance) layer, relying fully on no-quorum-policy=stop, naively
   thinking that one 100% covered, but with purely pacemaker hat
   on, we -- the pacemaker dev -- can't really give you such
   a guarantee, because we have no visibility into said "bail out"
   shortcuts that corosyncs makes for such rare circumstances -- you
   shall refer to corosync documentation, but it's not covered there
   (man pages) AFAIK (if it was _all_ indicated to pacemaker, just
   standard response on quorum loss could be carried out, not
   resorting to anything more drastic like here)


2. based on said missing explicit and clear inter-component signalling
   (1.) and the logs provided, it's fair to bring an argument that
   pacemaker had an opportunity to see, barring said explicit API
   signalling, that corosync died, but then, the major assumed case is:

   - corosync crashed or was explicitly killed (perhaps to test the
 claimed HA resiliency towards the outer world)

   - broken pacemaker-corosync communication consistency
 (did some messages fall through the cracks?)

   i.e., cluster endangering scenarios, not something to keep alive
   at all costs, better to try to stabilize the environment first,
   no to speak about chances with "miracles awaiting" strategy


3. despite 2.. there was a decision with systemd-enabled systems to
   actually pursue said "at all costs" (althought implicitly
   mitigated when the restart cycles would be happening in the rapid
   pace)

   - it's all then in the hands in slightly non-deterministic timing
 (token loss timeout window hit/miss, although perhaps not in
 this very case if the state within the protocol would be a clear
 indicator for other corosync peers)
   
   - I'd actually assume the pacemaker would be restarted in said
 scenario (unless one fiddled with the pacemaker service file,
 that is), and just prior to that, corosync would be forcibly
 started anew as well

   - is the 

Re: [ClusterLabs] Master/slave failover does not work as expected

2019-08-12 Thread Andrei Borzenkov
On Mon, Aug 12, 2019 at 4:12 PM Michael Powell <
michael.pow...@harmonicinc.com> wrote:

> At 07:44:49, the ss agent discovers that the master instance has failed on
> node *mgraid…-0* as a result of a failed *ssadm* request in response to
> an *ss_monitor()* operation.  It issues a *crm_master -Q -D* command with
> the intent of demoting the master and promoting the slave, on the other
> node, to master.  The *ss_demote()* function finds that the application
> is no longer running and returns *OCF_NOT_RUNNING* (7).  In the older
> product, this was sufficient to promote the other instance to master, but
> in the current product, that does not happen.  Currently, the failed
> application is restarted, as expected, and is promoted to master, but this
> takes 10’s of seconds.
>
>
>

Did you try to disable resource stickiness for this ms?
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] why is node fenced ?

2019-08-12 Thread Chris Walker
When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for 
example,

Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info: pcmk_quorum_notification: 
Quorum retained | membership=1320 members=1

after ~20s (dc-deadtime parameter), ha-idg-2 is marked 'unclean' and STONITHed 
as part of startup fencing.

There is nothing in ha-idg-2's HA logs around 17:43 indicating that it saw 
ha-idg-1 either, so it appears that there was no communication at all between 
the two nodes.

I'm not sure exactly why the nodes did not see one another, but there are 
indications of network issues around this time

2019-08-09T17:42:16.427947+02:00 ha-idg-2 kernel: [ 1229.245533] bond1: now 
running without any active interface!

so perhaps that's related.

HTH,
Chris


On 8/12/19, 12:09 PM, "Users on behalf of Lentes, Bernd" 
 
wrote:

Hi,

last Friday (9th of August) i had to install patches on my two-node cluster.
I put one of the nodes (ha-idg-2) into standby (crm node standby ha-idg-2), 
patched it, rebooted, 
started the cluster (systemctl start pacemaker) again, put the node again 
online, everything fine.

Then i wanted to do the same procedure with the other node (ha-idg-1).
I put it in standby, patched it, rebooted, started pacemaker again.
But then ha-idg-1 fenced ha-idg-2, it said the node is unclean.
I know that nodes which are unclean need to be shutdown, that's logical.

But i don't know from where the conclusion comes that the node is unclean 
respectively why it is unclean,
i searched in the logs and didn't find any hint.

I put the syslog and the pacemaker log on a seafile share, i'd be very 
thankful if you'll have a look.
https://hmgubox.helmholtz-muenchen.de/d/53a10960932445fb9cfe/

Here the cli history of the commands:

17:03:04  crm node standby ha-idg-2
17:07:15  zypper up (install Updates on ha-idg-2)
17:17:30  systemctl reboot
17:25:21  systemctl start pacemaker.service
17:25:47  crm node online ha-idg-2
17:26:35  crm node standby ha-idg1-
17:30:21  zypper up (install Updates on ha-idg-1)
17:37:32  systemctl reboot
17:43:04  systemctl start pacemaker.service
17:44:00  ha-idg-1 is fenced

Thanks.

Bernd

OS is SLES 12 SP4, pacemaker 1.1.19, corosync 2.3.6-9.13.1


-- 

Bernd Lentes 
Systemadministration 
Institut für Entwicklungsgenetik 
Gebäude 35.34 - Raum 208 
HelmholtzZentrum münchen 
bernd.len...@helmholtz-muenchen.de 
phone: +49 89 3187 1241 
phone: +49 89 3187 3827 
fax: +49 89 3187 2294 
http://www.helmholtz-muenchen.de/idg 

Perfekt ist wer keine Fehler macht 
Also sind Tote perfekt
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich 
Bassler, Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Postgres HA - pacemaker RA do not support auto failback

2019-08-12 Thread Shital A
Hello,

Postgres version : 9.6
OS:Rhel 7.6

We are working on HA setup for postgres cluster of two nodes in
active-passive mode.

Installed:
Pacemaker 1.1.19
Corosync 2.4.3

The pacemaker agent with this installation doesn't support automatic
failback. What I mean by that is explained below:
1. Cluster is setup like A - B with A as master.
2. Kill services on A, node B will come up as master.
3. node A is ready to join the cluster, we have to delete the lock file it
creates on any one of the node and execute the cleanup command to get the
node back as standby

Step 3 is manual so HA is not achieved in real sense.

Please help to check:
1. Is there any version of the resouce agent which supports automatic
failback? To avoid generation of lock file and deleting it.

2. If there is no such support, if we need such functionality, do we have
to modify existing code?

How this can be achieved. Please suggest.
Thanks.

Thanks.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] why is node fenced ?

2019-08-12 Thread Lentes, Bernd
Hi,

last Friday (9th of August) i had to install patches on my two-node cluster.
I put one of the nodes (ha-idg-2) into standby (crm node standby ha-idg-2), 
patched it, rebooted, 
started the cluster (systemctl start pacemaker) again, put the node again 
online, everything fine.

Then i wanted to do the same procedure with the other node (ha-idg-1).
I put it in standby, patched it, rebooted, started pacemaker again.
But then ha-idg-1 fenced ha-idg-2, it said the node is unclean.
I know that nodes which are unclean need to be shutdown, that's logical.

But i don't know from where the conclusion comes that the node is unclean 
respectively why it is unclean,
i searched in the logs and didn't find any hint.

I put the syslog and the pacemaker log on a seafile share, i'd be very thankful 
if you'll have a look.
https://hmgubox.helmholtz-muenchen.de/d/53a10960932445fb9cfe/

Here the cli history of the commands:

17:03:04  crm node standby ha-idg-2
17:07:15  zypper up (install Updates on ha-idg-2)
17:17:30  systemctl reboot
17:25:21  systemctl start pacemaker.service
17:25:47  crm node online ha-idg-2
17:26:35  crm node standby ha-idg1-
17:30:21  zypper up (install Updates on ha-idg-1)
17:37:32  systemctl reboot
17:43:04  systemctl start pacemaker.service
17:44:00  ha-idg-1 is fenced

Thanks.

Bernd

OS is SLES 12 SP4, pacemaker 1.1.19, corosync 2.3.6-9.13.1


-- 

Bernd Lentes 
Systemadministration 
Institut für Entwicklungsgenetik 
Gebäude 35.34 - Raum 208 
HelmholtzZentrum münchen 
bernd.len...@helmholtz-muenchen.de 
phone: +49 89 3187 1241 
phone: +49 89 3187 3827 
fax: +49 89 3187 2294 
http://www.helmholtz-muenchen.de/idg 

Perfekt ist wer keine Fehler macht 
Also sind Tote perfekt
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, 
Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Q: "crmd[7281]: warning: new_event_notification (7281-97955-15): Broken pipe (32)" as response to resource cleanup

2019-08-12 Thread Ulrich Windl
Hi!

I just noticed that a "crm resource cleanup " caused some unexpected 
behavior and the syslog message:
crmd[7281]:  warning: new_event_notification (7281-97955-15): Broken pipe (32)

It's SLES14 SP4 last updated Sept. 2018 (up since then, 
pacemaker-1.1.19+20180928.0d2680780-1.8.x86_64).

The cleanup was due to a failed monitor. As an unexpected consequence of this 
cleanup, CRM seemed to restart the complete resource (and dependencies), even 
though it was running.

I noticed that a manual "crm_resource -C -r  -N " command has the 
same effect (multiple resources are "Cleaned up", resources are restarted 
seemingly before the "probe" is done.).
Actually the manual says when cleaning up a single primitive, the whole group 
is cleaned up, unless using --force. Well ,I don't like this default, as I 
expect any status change from probe would propagate to the group anyway...

Regards,
Ulrich



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Yan Gao


On 8/12/19 3:24 PM, Klaus Wenninger wrote:
> On 8/12/19 2:30 PM, Yan Gao wrote:
>> Hi Klaus,
>>
>> On 8/12/19 1:39 PM, Klaus Wenninger wrote:
>>> On 8/9/19 9:06 PM, Yan Gao wrote:
 On 8/9/19 6:40 PM, Andrei Borzenkov wrote:
> 09.08.2019 16:34, Yan Gao пишет:
>> Hi,
>>
>> With disk-less sbd,  it's fine to stop cluster service from the cluster
>> nodes all at the same time.
>>
>> But if to stop the nodes one by one, for example with a 3-node cluster,
>> after stopping the 2nd node, the only remaining node resets itself with:
>>
> That is sort of documented in SBD manual page:
>
> --><--
> However, while the cluster is in such a degraded state, it can
> neither successfully fence nor be shutdown cleanly (as taking the
> cluster below the quorum threshold will immediately cause all remaining
> nodes to self-fence).
> --><--
>
> SBD in shared-nothing mode is basically always in such degraded state
> and cannot tolerate loss of quorum.
 Well, the context here is it loses quorum *expectedly* since the other
 nodes gracefully shut down.

>> Aug 09 14:30:20 opensuse150-1 sbd[1079]:   pcmk:debug:
>> notify_parent: Not notifying parent: state transient (2)
>> Aug 09 14:30:20 opensuse150-1 sbd[1080]:cluster:debug:
>> notify_parent: Notifying parent: healthy
>> Aug 09 14:30:20 opensuse150-1 sbd[1078]:  warning: inquisitor_child:
>> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 
>> 0)
>>
>> I can think of the way to manipulate quorum with last_man_standing and
>> potentially also auto_tie_breaker, not to mention
>> last_man_standing_window would also be a factor... But is there a better
>> solution?
>>
> Lack of cluster wide shutdown mode was mentioned more than once on this
> list. I guess the only workaround is to use higher level tools which
> basically simply try to stop cluster on all nodes at once. It is still
> susceptible to race condition.
 Gracefully stopping nodes one by one on purpose is still a reasonable
 need though ...
>>> If you do the teardown as e.g. pcs is doing it - first tear down
>>> pacemaker-instances and then corosync/sbd - it is at
>>> least possible to tear down the pacemaker-instances one-by one
>>> without risking a reboot due to quorum-loss.
>>> With kind of current sbd having in
>>> -
>>> https://github.com/ClusterLabs/sbd/commit/824fe834c67fb7bae7feb87607381f9fa8fa2945
>>> -
>>> https://github.com/ClusterLabs/sbd/commit/79b778debfee5b4ab2d099b2bfc7385f45597f70
>>> -
>>> https://github.com/ClusterLabs/sbd/commit/a716a8ddd3df615009bcff3bd96dd9ae64cb5f68
>>> this should be pretty robust although we are still thinking
>>> (probably together with some heartbeat to pacemakerd
>>> that assures pacemakerd is checking liveness of sub-daemons
>>> properly) of having a cleaner way to detect graceful
>>> pacemaker-shutdown.
>> These are all good improvements, thanks!
>>
>> But in this case the remaining node is not shutting down yet, or it's
>> intentionally not being shut down :-) Loss of quorum is as expected, so
>> is following no-quorum-policy, but self-reset is probably too much?
> Hmm ... not sure if I can follow ...
> If you shutdown solely pacemaker one-by-one on all nodes
> and these shutdowns are considered graceful then you are
> not gonna experience any reboots (e.g. 3 node cluster).
> Afterwards you can shutdown corosync one-by-one as well
> without experiencing reboots as without the cib-connection
> sbd isn't gonna check for quorum anymore (all resources
> down so no need to reboot in case of quorum-loss - extra
> care has to be taken care of with unmanaged resources but
> that isn't particular with sbd).
I meant if users would like shut down only 2 out of 3 nodes in the 
cluster and keep the last one online and alive, it's simply not possible 
for now, although the loss of quorum is expected.

Regards,
   Yan
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Klaus Wenninger
On 8/12/19 2:30 PM, Yan Gao wrote:
> Hi Klaus,
>
> On 8/12/19 1:39 PM, Klaus Wenninger wrote:
>> On 8/9/19 9:06 PM, Yan Gao wrote:
>>> On 8/9/19 6:40 PM, Andrei Borzenkov wrote:
 09.08.2019 16:34, Yan Gao пишет:
> Hi,
>
> With disk-less sbd,  it's fine to stop cluster service from the cluster
> nodes all at the same time.
>
> But if to stop the nodes one by one, for example with a 3-node cluster,
> after stopping the 2nd node, the only remaining node resets itself with:
>
 That is sort of documented in SBD manual page:

 --><--
 However, while the cluster is in such a degraded state, it can
 neither successfully fence nor be shutdown cleanly (as taking the
 cluster below the quorum threshold will immediately cause all remaining
 nodes to self-fence).
 --><--

 SBD in shared-nothing mode is basically always in such degraded state
 and cannot tolerate loss of quorum.
>>> Well, the context here is it loses quorum *expectedly* since the other
>>> nodes gracefully shut down.
>>>
> Aug 09 14:30:20 opensuse150-1 sbd[1079]:   pcmk:debug:
> notify_parent: Not notifying parent: state transient (2)
> Aug 09 14:30:20 opensuse150-1 sbd[1080]:cluster:debug:
> notify_parent: Notifying parent: healthy
> Aug 09 14:30:20 opensuse150-1 sbd[1078]:  warning: inquisitor_child:
> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 
> 0)
>
> I can think of the way to manipulate quorum with last_man_standing and
> potentially also auto_tie_breaker, not to mention
> last_man_standing_window would also be a factor... But is there a better
> solution?
>
 Lack of cluster wide shutdown mode was mentioned more than once on this
 list. I guess the only workaround is to use higher level tools which
 basically simply try to stop cluster on all nodes at once. It is still
 susceptible to race condition.
>>> Gracefully stopping nodes one by one on purpose is still a reasonable
>>> need though ...
>> If you do the teardown as e.g. pcs is doing it - first tear down
>> pacemaker-instances and then corosync/sbd - it is at
>> least possible to tear down the pacemaker-instances one-by one
>> without risking a reboot due to quorum-loss.
>> With kind of current sbd having in
>> - 
>> https://github.com/ClusterLabs/sbd/commit/824fe834c67fb7bae7feb87607381f9fa8fa2945
>> - 
>> https://github.com/ClusterLabs/sbd/commit/79b778debfee5b4ab2d099b2bfc7385f45597f70
>> - 
>> https://github.com/ClusterLabs/sbd/commit/a716a8ddd3df615009bcff3bd96dd9ae64cb5f68
>> this should be pretty robust although we are still thinking
>> (probably together with some heartbeat to pacemakerd
>> that assures pacemakerd is checking liveness of sub-daemons
>> properly) of having a cleaner way to detect graceful
>> pacemaker-shutdown.
> These are all good improvements, thanks!
>
> But in this case the remaining node is not shutting down yet, or it's 
> intentionally not being shut down :-) Loss of quorum is as expected, so 
> is following no-quorum-policy, but self-reset is probably too much?
Hmm ... not sure if I can follow ...
If you shutdown solely pacemaker one-by-one on all nodes
and these shutdowns are considered graceful then you are
not gonna experience any reboots (e.g. 3 node cluster).
Afterwards you can shutdown corosync one-by-one as well
without experiencing reboots as without the cib-connection
sbd isn't gonna check for quorum anymore (all resources
down so no need to reboot in case of quorum-loss - extra
care has to be taken care of with unmanaged resources but
that isn't particular with sbd).

Klaus
>
> Regards,
>Yan

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Yan Gao
On 8/12/19 8:42 AM,  Ulrich Windl  wrote:
> Hi!
> 
> One motivation to stop all nodes at the same time is to avoid needless moving
> of resources, like the following:
> You stop node A, then resources are stopped on A and started elsewhere
> You stop node B, and resources are stopped and moved to remaining nodes
> ...until the last node stops, or quorum prevents cluster operation (effect
> depends on further settings)
This could potentially achieved by first putting all the nodes into 
standby mode with an atomic request. crmsh doesn't support this from 
"crm node standy" interface so far, but "crm configure edit" definitely 
can do this.

Regards,
   Yan

> 
> Unfortunately (AFAIK) there's not command to "stop the cluster" yet.
> A "stop cluster" command would stop all resources on all nodes, then stop the
> nodes (and lower layers) in a way that there is no "quorum lost" or fencing
> going on. >
> Regards,
> Ulrich
> 
 Yan Gao  schrieb am 09.08.2019 um 15:34 in Nachricht
> :
>> Hi,
>>
>> With disk‑less sbd,  it's fine to stop cluster service from the cluster
>> nodes all at the same time.
>>
>> But if to stop the nodes one by one, for example with a 3‑node cluster,
>> after stopping the 2nd node, the only remaining node resets itself with:
>>
>> Aug 09 14:30:20 opensuse150‑1 sbd[1079]:   pcmk:debug:
>> notify_parent: Not notifying parent: state transient (2)
>> Aug 09 14:30:20 opensuse150‑1 sbd[1080]:cluster:debug:
>> notify_parent: Notifying parent: healthy
>> Aug 09 14:30:20 opensuse150‑1 sbd[1078]:  warning: inquisitor_child:
>> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
>>
>> I can think of the way to manipulate quorum with last_man_standing and
>> potentially also auto_tie_breaker, not to mention
>> last_man_standing_window would also be a factor... But is there a better
>> solution?
>>
>> Thanks,
>> Yan
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
> 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Yan Gao
Hi Klaus,

On 8/12/19 1:39 PM, Klaus Wenninger wrote:
> On 8/9/19 9:06 PM, Yan Gao wrote:
>> On 8/9/19 6:40 PM, Andrei Borzenkov wrote:
>>> 09.08.2019 16:34, Yan Gao пишет:
 Hi,

 With disk-less sbd,  it's fine to stop cluster service from the cluster
 nodes all at the same time.

 But if to stop the nodes one by one, for example with a 3-node cluster,
 after stopping the 2nd node, the only remaining node resets itself with:

>>> That is sort of documented in SBD manual page:
>>>
>>> --><--
>>> However, while the cluster is in such a degraded state, it can
>>> neither successfully fence nor be shutdown cleanly (as taking the
>>> cluster below the quorum threshold will immediately cause all remaining
>>> nodes to self-fence).
>>> --><--
>>>
>>> SBD in shared-nothing mode is basically always in such degraded state
>>> and cannot tolerate loss of quorum.
>> Well, the context here is it loses quorum *expectedly* since the other
>> nodes gracefully shut down.
>>
>>>
 Aug 09 14:30:20 opensuse150-1 sbd[1079]:   pcmk:debug:
 notify_parent: Not notifying parent: state transient (2)
 Aug 09 14:30:20 opensuse150-1 sbd[1080]:cluster:debug:
 notify_parent: Notifying parent: healthy
 Aug 09 14:30:20 opensuse150-1 sbd[1078]:  warning: inquisitor_child:
 Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)

 I can think of the way to manipulate quorum with last_man_standing and
 potentially also auto_tie_breaker, not to mention
 last_man_standing_window would also be a factor... But is there a better
 solution?

>>> Lack of cluster wide shutdown mode was mentioned more than once on this
>>> list. I guess the only workaround is to use higher level tools which
>>> basically simply try to stop cluster on all nodes at once. It is still
>>> susceptible to race condition.
>> Gracefully stopping nodes one by one on purpose is still a reasonable
>> need though ...
> If you do the teardown as e.g. pcs is doing it - first tear down
> pacemaker-instances and then corosync/sbd - it is at
> least possible to tear down the pacemaker-instances one-by one
> without risking a reboot due to quorum-loss.
> With kind of current sbd having in
> - 
> https://github.com/ClusterLabs/sbd/commit/824fe834c67fb7bae7feb87607381f9fa8fa2945
> - 
> https://github.com/ClusterLabs/sbd/commit/79b778debfee5b4ab2d099b2bfc7385f45597f70
> - 
> https://github.com/ClusterLabs/sbd/commit/a716a8ddd3df615009bcff3bd96dd9ae64cb5f68
> this should be pretty robust although we are still thinking
> (probably together with some heartbeat to pacemakerd
> that assures pacemakerd is checking liveness of sub-daemons
> properly) of having a cleaner way to detect graceful
> pacemaker-shutdown.
These are all good improvements, thanks!

But in this case the remaining node is not shutting down yet, or it's 
intentionally not being shut down :-) Loss of quorum is as expected, so 
is following no-quorum-policy, but self-reset is probably too much?

Regards,
   Yan
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Klaus Wenninger
On 8/9/19 9:06 PM, Yan Gao wrote:
> On 8/9/19 6:40 PM, Andrei Borzenkov wrote:
>> 09.08.2019 16:34, Yan Gao пишет:
>>> Hi,
>>>
>>> With disk-less sbd,  it's fine to stop cluster service from the cluster
>>> nodes all at the same time.
>>>
>>> But if to stop the nodes one by one, for example with a 3-node cluster,
>>> after stopping the 2nd node, the only remaining node resets itself with:
>>>
>> That is sort of documented in SBD manual page:
>>
>> --><--
>> However, while the cluster is in such a degraded state, it can
>> neither successfully fence nor be shutdown cleanly (as taking the
>> cluster below the quorum threshold will immediately cause all remaining
>> nodes to self-fence).
>> --><--
>>
>> SBD in shared-nothing mode is basically always in such degraded state
>> and cannot tolerate loss of quorum.
> Well, the context here is it loses quorum *expectedly* since the other 
> nodes gracefully shut down.
>
>>
>>
>>> Aug 09 14:30:20 opensuse150-1 sbd[1079]:   pcmk:debug:
>>> notify_parent: Not notifying parent: state transient (2)
>>> Aug 09 14:30:20 opensuse150-1 sbd[1080]:cluster:debug:
>>> notify_parent: Notifying parent: healthy
>>> Aug 09 14:30:20 opensuse150-1 sbd[1078]:  warning: inquisitor_child:
>>> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
>>>
>>> I can think of the way to manipulate quorum with last_man_standing and
>>> potentially also auto_tie_breaker, not to mention
>>> last_man_standing_window would also be a factor... But is there a better
>>> solution?
>>>
>> Lack of cluster wide shutdown mode was mentioned more than once on this
>> list. I guess the only workaround is to use higher level tools which
>> basically simply try to stop cluster on all nodes at once. It is still
>> susceptible to race condition.
> Gracefully stopping nodes one by one on purpose is still a reasonable 
> need though ...
If you do the teardown as e.g. pcs is doing it - first tear down
pacemaker-instances and then corosync/sbd - it is at
least possible to tear down the pacemaker-instances one-by one
without risking a reboot due to quorum-loss.
With kind of current sbd having in
-
https://github.com/ClusterLabs/sbd/commit/824fe834c67fb7bae7feb87607381f9fa8fa2945
-
https://github.com/ClusterLabs/sbd/commit/79b778debfee5b4ab2d099b2bfc7385f45597f70
-
https://github.com/ClusterLabs/sbd/commit/a716a8ddd3df615009bcff3bd96dd9ae64cb5f68
this should be pretty robust although we are still thinking
(probably together with some heartbeat to pacemakerd
that assures pacemakerd is checking liveness of sub-daemons
properly) of having a cleaner way to detect graceful
pacemaker-shutdown.

Klaus
>
> Regards,
>Yan
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: Re: Antw: Re: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Ulrich Windl
>>> Roger Zhou  schrieb am 12.08.2019 um 10:55 in Nachricht
<7249e013-1256-675a-3cea-3572f4615...@suse.com>:

> On 8/12/19 2:48 PM,  Ulrich Windl  wrote:
> Andrei Borzenkov  schrieb am 09.08.2019 um 18:40
in
>> Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>:
>>> 09.08.2019 16:34, Yan Gao пишет:
> 
> [...]
> 
>>>
>>> Lack of cluster wide shutdown mode was mentioned more than once on this
>>> list. I guess the only workaround is to use higher level tools which
>>> basically simply try to stop cluster on all nodes at once. 
> 
> I try to think of ssh/pssh to the involved nodes and stop diskless SBD
> daemons.  However, SBD is not able to be teared down on it own. It is
> deeply tied up with pacemaker and corosync and has to be stop all
> together. Or, to hack SBD dependency otherwise.
> 
>>> It is still
>>> susceptible to race condition.
>> 
>> Are there any concrete plans to implement a clean solution?
>> 
> 
> I can think of Yet Another Feature to disable diskless SBD on-purpose.
> eg. to let SBD understands "stonith-enabled=false" at the cluster wide.

Hi!

I imagine that some new mechanism would be needed to have non-persistent or
self-resetting attribute changes in the CIB:
For example if you do a "resource restart" and the node where the command runs
is fenced during the "stop" phase, the resource remains stopped until started
manually. This is because the "restart" is implemented as sequential non-atomic
"stop, then start".
Similar for a "cluster stop": There is a attribute "stop-all-resources"
(AFAIR). A "cluster stop" could temporarily set this to get all resources on
all nodes stopped. Then the pacemakers and corosyncs and sbds should stop. On
restart each node should start up normally...
BTW: HP-UX ServiceGuard had not only a command to stop the cluster, but also
one to start the cluster: I imagine that it could play nice with pacemaker as
well: The command would first start all the SBDs, corosyncs, and pacemakers,
and once the DC is selected, resources would start without needless shuffling
(migration) resources between nodes joining the cluster.

Regards,
Ulrich

> 
> 
> Cheers,
> Roger



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Querying failed rersource operations from the CIB

2019-08-12 Thread Ulrich Windl
Hi!

Back in December 2011 I had written a script to retrieve all failed resource 
operations by using "cibadmin -Q -o  lrm_resources" as data base. I was 
querying lrm_rsc_op for op-status != 0.
In a newer release this does not seems to work anymore.

I see resource IDs ending with "_last_0", "_monitor_6", and 
"_last_failure_0", but even in the "_last_failure_0" the op-status is "0" 
(rc-code="7").
Is this some bug, or is it a feature? That is: When will op-status be != 0?

crm_mon still reports a resource failure like this:
Failed Resource Actions:
* prm_nfs_server_monitor_6 on h11 'not running' (7): call=738, 
status=complete, exitreason='',
last-rc-change='Mon Aug 12 04:52:23 2019', queued=0ms, exec=0ms

(it seems the nfs server monitor does this under load in SLES12 SP4, and I 
wonder where to look for the reason)
BTW: "lrm_resources" is not documented, and the structure seemes to change. Can 
I restrict the output to LRM data?

Regards,
Ulrich



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Roger Zhou

On 8/12/19 2:48 PM,  Ulrich Windl  wrote:
 Andrei Borzenkov  schrieb am 09.08.2019 um 18:40 in
> Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>:
>> 09.08.2019 16:34, Yan Gao пишет:

[...]

>>
>> Lack of cluster wide shutdown mode was mentioned more than once on this
>> list. I guess the only workaround is to use higher level tools which
>> basically simply try to stop cluster on all nodes at once. 

I try to think of ssh/pssh to the involved nodes and stop diskless SBD
daemons.  However, SBD is not able to be teared down on it own. It is
deeply tied up with pacemaker and corosync and has to be stop all
together. Or, to hack SBD dependency otherwise.

>> It is still
>> susceptible to race condition.
> 
> Are there any concrete plans to implement a clean solution?
> 

I can think of Yet Another Feature to disable diskless SBD on-purpose.
eg. to let SBD understands "stonith-enabled=false" at the cluster wide.


Cheers,
Roger
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Strange lost quorum with qdevice

2019-08-12 Thread Jan Friesse

Andrei Borzenkov napsal(a):



Отправлено с iPhone


12 авг. 2019 г., в 8:46, Jan Friesse  написал(а):

Олег Самойлов napsal(a):

9 авг. 2019 г., в 9:25, Jan Friesse  написал(а):
Please do not set dpd_interval that high. dpd_interval on qnetd side is not 
about how often is the ping is sent. Could you please retry your test with 
dpd_interval=1000? I'm pretty sure it will work then.

Honza

Yep. As far as I undestand dpd_interval of qnetd, timeout and sync_timeout of 
qdevice is somehow linked. By default they are dpd_interval=10, timeout=10, 
sync_timeout=30. And you advised to change them proportionally.


Yes, timeout and sync_timeout should be changed proportionally. dpd_interval is 
different story.


https://github.com/ClusterLabs/sbd/pull/76#issuecomment-486952369
But mechanic how they are depend on each other is mysterious and is not 
documented.


Let me try to bring some light in there:

- dpd_interval is qnetd variable how often qnetd walks thru the list of all 
clients (qdevices) and checks timestamp of last sent message. If diff between 
current timestamp and last sent message timestamp is larger than 2 * timeout 
sent by client then client is considered as death.

- interval - affects how often qdevice sends heartbeat to corosync (this is 
half of the interval) about its liveness and also how often it sends heartbeat 
to qnetd (0.8 * interval). On corosync side this is used as a timeout after 
which qdevice daemon is considered death and its votes are no longer valid.

- sync_timeout - Not used by qdevice/qnetd. Used by corosync during sync phase. 
If corosync doesn't get reply by qdevice till this timeout it considers qdevice 
daemon death and continues sync process.



Looking at logs on the beginning of this thread as well as logs in linked 
github issue, it appears that corosync does not do anything during 
sync_timeout, in particular does *not* ask qdevice and device does not ask 
qnetd.


corosync is waiting for qdevice to call votequorum_qdevice_poll 
function. qdevice asks qnetd (how else could it get the vote?).






I rechecked test with 20-60 combination. I get the same problem on 16th failure 
simultation. The

qnetd return vote exactly in the same second, when qdevice expects, but 
slightly less. So the node lost quorum, got vote slightly later, but don't get 
quorum may be due to 'wait for all' option.


That matches above observation. As soon as corosync is unfrozen, it asks qnetd 
which returns its vote.


Actually if you would take a look to log it is evident that qdevice 
asked qnetd for a vote, but result was to wait for reply, because qnetd 
couldn't give proper answer till it gets complete information from all 
nodes.




So I still do not understand what is supposed to happen during sync_timeout and 
whether observed behavior is intentional. So far it looks just like artificial 
delay.


Not at all. It's just a bug in timeouts. I have a proper fix in my mind, 
but it's not just about lowering limits.


Honza




I retried the default 10-30 combination. I got the same problem on the first 
failure simulation. Qnetd send vote after 1 second, then expected.
Combination is 1-3 (dpd_interval=1, timeout=1, sync_timeout=3). The same 
problem on 11th failore simulation. The qnetd return vote exactly in the same 
second, when qdevice expects, but slightly less. So the node lost quorum, got 
vote slightly later, but don't get quorum may be due to 'wait for all' option. 
And node is watchdoged later due to lack of quorum.


It was probably not evident from my reply, but what I meant was to change just 
dpd_interval. Could you please recheck with dpd_interval=1, timeout=20, 
sync_timeout=60?

Honza



So, my conclusions:
1. IMHO may be this bug depend not on absolute value of dpd_interval, on 
proportion between dpd_interval of qnetd and timeout, sync_timeout of qdevice. 
Because this options, I can not predict how to change them to work around this 
behaviour.
2. IMHO "wait for all" also bugged. According on documentation it must fire 
only on the start of cluster, but looked like it fire every time when quorum (or all 
votes) is lost.


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Increasing fence timeout

2019-08-12 Thread Oyvind Albrigtsen

You should be able to increase this timeout by running:
pcs stonith update  shell_timeout=10

Oyvind

On 08/08/19 12:13 -0600, Casey & Gina wrote:

Hi, I'm currently running into periodic premature killing of nodes due to the 
fence monitor timeout being set to 5 seconds.  Here is an example message from 
the logs:

fence_vmware_rest[22334] stderr: [ Exception: Operation timed out after 5001 
milliseconds with 0 bytes received ]

How can I increase this timeout using PCS?

Thank you,
--
Casey
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Andrei Borzenkov


Отправлено с iPhone

12 авг. 2019 г., в 9:48, Ulrich Windl  
написал(а):

 Andrei Borzenkov  schrieb am 09.08.2019 um 18:40 in
> Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>:
>> 09.08.2019 16:34, Yan Gao пишет:
>>> Hi,
>>> 
>>> With disk-less sbd,  it's fine to stop cluster service from the cluster 
>>> nodes all at the same time.
>>> 
>>> But if to stop the nodes one by one, for example with a 3-node cluster, 
>>> after stopping the 2nd node, the only remaining node resets itself with:
>>> 
>> 
>> That is sort of documented in SBD manual page:
>> 
>> --><--
>> However, while the cluster is in such a degraded state, it can
>> neither successfully fence nor be shutdown cleanly (as taking the
>> cluster below the quorum threshold will immediately cause all remaining
>> nodes to self-fence).
>> --><--
>> 
>> SBD in shared-nothing mode is basically always in such degraded state
>> and cannot tolerate loss of quorum.
> 
> So with a shared device it'S different?

Yes, as long as shared device is accessible.


> I was wondering whether
> "no-quorum-policy=freeze" would still work with the recent sbd...
> 

It will with shared device.

>> 
>> 
>> 
>>> Aug 09 14:30:20 opensuse150-1 sbd[1079]:   pcmk:debug: 
>>> notify_parent: Not notifying parent: state transient (2)
>>> Aug 09 14:30:20 opensuse150-1 sbd[1080]:cluster:debug: 
>>> notify_parent: Notifying parent: healthy
>>> Aug 09 14:30:20 opensuse150-1 sbd[1078]:  warning: inquisitor_child: 
>>> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants:
> 0)
>>> 
>>> I can think of the way to manipulate quorum with last_man_standing and 
>>> potentially also auto_tie_breaker, not to mention 
>>> last_man_standing_window would also be a factor... But is there a better 
>>> solution?
>>> 
>> 
>> Lack of cluster wide shutdown mode was mentioned more than once on this
>> list. I guess the only workaround is to use higher level tools which
>> basically simply try to stop cluster on all nodes at once. It is still
>> susceptible to race condition.
> 
> Are there any concrete plans to implement a clean solution?
> 
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: Re: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Ulrich Windl
>>> Andrei Borzenkov  schrieb am 09.08.2019 um 18:40 in
Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>:
> 09.08.2019 16:34, Yan Gao пишет:
>> Hi,
>> 
>> With disk-less sbd,  it's fine to stop cluster service from the cluster 
>> nodes all at the same time.
>> 
>> But if to stop the nodes one by one, for example with a 3-node cluster, 
>> after stopping the 2nd node, the only remaining node resets itself with:
>> 
> 
> That is sort of documented in SBD manual page:
> 
> --><--
> However, while the cluster is in such a degraded state, it can
> neither successfully fence nor be shutdown cleanly (as taking the
> cluster below the quorum threshold will immediately cause all remaining
> nodes to self-fence).
> --><--
> 
> SBD in shared-nothing mode is basically always in such degraded state
> and cannot tolerate loss of quorum.

So with a shared device it'S different? I was wondering whether
"no-quorum-policy=freeze" would still work with the recent sbd...

> 
> 
> 
>> Aug 09 14:30:20 opensuse150-1 sbd[1079]:   pcmk:debug: 
>> notify_parent: Not notifying parent: state transient (2)
>> Aug 09 14:30:20 opensuse150-1 sbd[1080]:cluster:debug: 
>> notify_parent: Notifying parent: healthy
>> Aug 09 14:30:20 opensuse150-1 sbd[1078]:  warning: inquisitor_child: 
>> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants:
0)
>> 
>> I can think of the way to manipulate quorum with last_man_standing and 
>> potentially also auto_tie_breaker, not to mention 
>> last_man_standing_window would also be a factor... But is there a better 
>> solution?
>> 
> 
> Lack of cluster wide shutdown mode was mentioned more than once on this
> list. I guess the only workaround is to use higher level tools which
> basically simply try to stop cluster on all nodes at once. It is still
> susceptible to race condition.

Are there any concrete plans to implement a clean solution?

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Ulrich Windl
Hi!

One motivation to stop all nodes at the same time is to avoid needless moving
of resources, like the following:
You stop node A, then resources are stopped on A and started elsewhere
You stop node B, and resources are stopped and moved to remaining nodes
...until the last node stops, or quorum prevents cluster operation (effect
depends on further settings)

Unfortunately (AFAIK) there's not command to "stop the cluster" yet.
A "stop cluster" command would stop all resources on all nodes, then stop the
nodes (and lower layers) in a way that there is no "quorum lost" or fencing
going on.

Regards,
Ulrich

>>> Yan Gao  schrieb am 09.08.2019 um 15:34 in Nachricht
:
> Hi,
> 
> With disk‑less sbd,  it's fine to stop cluster service from the cluster 
> nodes all at the same time.
> 
> But if to stop the nodes one by one, for example with a 3‑node cluster, 
> after stopping the 2nd node, the only remaining node resets itself with:
> 
> Aug 09 14:30:20 opensuse150‑1 sbd[1079]:   pcmk:debug: 
> notify_parent: Not notifying parent: state transient (2)
> Aug 09 14:30:20 opensuse150‑1 sbd[1080]:cluster:debug: 
> notify_parent: Notifying parent: healthy
> Aug 09 14:30:20 opensuse150‑1 sbd[1078]:  warning: inquisitor_child: 
> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
> 
> I can think of the way to manipulate quorum with last_man_standing and 
> potentially also auto_tie_breaker, not to mention 
> last_man_standing_window would also be a factor... But is there a better 
> solution?
> 
> Thanks,
>Yan
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Strange lost quorum with qdevice

2019-08-12 Thread Andrei Borzenkov


Отправлено с iPhone

> 12 авг. 2019 г., в 8:46, Jan Friesse  написал(а):
> 
> Олег Самойлов napsal(a):
>>> 9 авг. 2019 г., в 9:25, Jan Friesse  написал(а):
>>> Please do not set dpd_interval that high. dpd_interval on qnetd side is not 
>>> about how often is the ping is sent. Could you please retry your test with 
>>> dpd_interval=1000? I'm pretty sure it will work then.
>>> 
>>> Honza
>> Yep. As far as I undestand dpd_interval of qnetd, timeout and sync_timeout 
>> of qdevice is somehow linked. By default they are dpd_interval=10, 
>> timeout=10, sync_timeout=30. And you advised to change them proportionally.
> 
> Yes, timeout and sync_timeout should be changed proportionally. dpd_interval 
> is different story.
> 
>> https://github.com/ClusterLabs/sbd/pull/76#issuecomment-486952369
>> But mechanic how they are depend on each other is mysterious and is not 
>> documented.
> 
> Let me try to bring some light in there:
> 
> - dpd_interval is qnetd variable how often qnetd walks thru the list of all 
> clients (qdevices) and checks timestamp of last sent message. If diff between 
> current timestamp and last sent message timestamp is larger than 2 * timeout 
> sent by client then client is considered as death.
> 
> - interval - affects how often qdevice sends heartbeat to corosync (this is 
> half of the interval) about its liveness and also how often it sends 
> heartbeat to qnetd (0.8 * interval). On corosync side this is used as a 
> timeout after which qdevice daemon is considered death and its votes are no 
> longer valid.
> 
> - sync_timeout - Not used by qdevice/qnetd. Used by corosync during sync 
> phase. If corosync doesn't get reply by qdevice till this timeout it 
> considers qdevice daemon death and continues sync process.
> 

Looking at logs on the beginning of this thread as well as logs in linked 
github issue, it appears that corosync does not do anything during 
sync_timeout, in particular does *not* ask qdevice and device does not ask 
qnetd.


>> I rechecked test with 20-60 combination. I get the same problem on 16th 
>> failure simultation. The 
> qnetd return vote exactly in the same second, when qdevice expects, but 
> slightly less. So the node lost quorum, got vote slightly later, but don't 
> get quorum may be due to 'wait for all' option.

That matches above observation. As soon as corosync is unfrozen, it asks qnetd 
which returns its vote.

So I still do not understand what is supposed to happen during sync_timeout and 
whether observed behavior is intentional. So far it looks just like artificial 
delay.

>> I retried the default 10-30 combination. I got the same problem on the first 
>> failure simulation. Qnetd send vote after 1 second, then expected.
>> Combination is 1-3 (dpd_interval=1, timeout=1, sync_timeout=3). The same 
>> problem on 11th failore simulation. The qnetd return vote exactly in the 
>> same second, when qdevice expects, but slightly less. So the node lost 
>> quorum, got vote slightly later, but don't get quorum may be due to 'wait 
>> for all' option. And node is watchdoged later due to lack of quorum.
> 
> It was probably not evident from my reply, but what I meant was to change 
> just dpd_interval. Could you please recheck with dpd_interval=1, timeout=20, 
> sync_timeout=60?
> 
> Honza
> 
> 
>> So, my conclusions:
>> 1. IMHO may be this bug depend not on absolute value of dpd_interval, on 
>> proportion between dpd_interval of qnetd and timeout, sync_timeout of 
>> qdevice. Because this options, I can not predict how to change them to work 
>> around this behaviour.
>> 2. IMHO "wait for all" also bugged. According on documentation it must fire 
>> only on the start of cluster, but looked like it fire every time when quorum 
>> (or all votes) is lost.
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: Re: corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

2019-08-12 Thread Ulrich Windl
>>> Roger Zhou  schrieb am 09.08.2019 um 10:19 in Nachricht
<06f700cb-d941-2f53-aee5-2d64c499c...@suse.com>:

> 
> On 8/9/19 3:39 PM, Jan Friesse wrote:
>> Roger Zhou napsal(a):
>>>
>>> On 8/9/19 2:27 PM, Roger Zhou wrote:

 On 7/29/19 12:24 AM, Andrei Borzenkov wrote:
> corosync.service sets StopWhenUnneded=yes which normally stops it when
> pacemaker is shut down.
>>>
>>> One more thought,
>>>
>>> Make sense to add "RefuseManualStop=true" to pacemaker.service?
>>> The same for corosync-qdevice.service?
>>>
>>> And "RefuseManualStart=true" to corosync.service?
>> 
>> I would say short answer is no, but I would like to hear what is the 
>> main idea for this proposal.
> 
> It's more about out of box user experience to guide the users of the 
> most use cases in the field to manage the whole cluster stack in the 
> appropriate steps, namely:
> 
> - To start stack: systemctl start pacemaker corosync-qdevice
> - To stop stack: systemctl stop corosync.service

As a user that was using "systemctl start/stop pacemaker.service" up to now, I 
wonder whether there shouldn't be a target like "cluster-node.target" that 
orchestrates all the services. Then recommend using "systemctl start/stop 
cluster-node"

> 
> and less error prone assumptions:
> 
> With "RefuseManualStop=true" to pacemaker.service, sometimes(if not often),
> 
> - it prevents the wrong assumption/wish/impression to stop the
>whole cluster together with corosync
> 
> - it prevents users forget one more step to stop corosync indeed
> 
> - it prevents some ISV do create disruptive scripts only stop pacemaker 
> and forget others.
> 
> - Being rejected at the first place, then naturally guide users to run 
> `systemctl stop corosync.service`
> 
> 
> And extends the same idea a little further to
> 
> - "RefuseManualStop=true" to corosync-qdevice.service
> - and "RefuseManualStart=true" to corosync.service
> 
> Well, I do feel corosync* are less error prone as pacemaker in this regards.
> 
> Thanks,
> Roger
> 
> 
>> 
>> Regards,
>>Honza
>> 
>>>
>>> @Jan, @Ken
>>>
>>> What do you think?
>>>
>>> Cheers,
>>> Roger
>>>
>>>

 `systemctl stop corosync.service` is the right command to stop those
 cluster stack.

 It stops pacemaker and corosync-qdevice first, and stop SBD too.

 pacemaker.service: After=corosync.service
 corosync-qdevice.service: After=corosync.service
 sbd.service: PartOf=corosync.service

 On the reverse side, to start the cluster stack, use

 systemctl start pacemaker.service corosync-qdevice

 It is slightly confusing from the impression. So, openSUSE uses the
 consistent commands as below:

 crm cluster start
 crm cluster stop

 Cheers,
 Roger

> Unfortunately, corosync-qdevice.service declares
> Requires=corosync.service and corosync-qdevice.service itself is *not*
> stopped when pacemaker.service is stopped. Which means corosync.service
> remains "needed" and is never stopped.
>
> Also sbd.service (which is PartOf=corosync.service) remains running 
> as well.
>
> The latter is really bad, as it means sbd watchdog can kick in at any
> time when user believes cluster stack is safely stopped. In particular
> if qnetd is not accessible (think network reconfiguration).
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
>
> ClusterLabs home: https://www.clusterlabs.org/ 
>
 ___
 Manage your subscription:
 https://lists.clusterlabs.org/mailman/listinfo/users 

 ClusterLabs home: https://www.clusterlabs.org/ 

>> 
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
>> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Ubuntu 18.04 and corosync-qdevice

2019-08-12 Thread Jan Friesse

Nickle, Richard napsal(a):

I've built a two-node DRBD cluster with SBD and STONITH, following advice
from ClusterLabs, LinBit, Beekhof's blog on SBD.

I still cannot get automated failover when I down one of the nodes.  I
thought that perhaps I needed to have an odd-numbered quorum so I attempted
to follow the corosync-qdevice instructions here:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-quorumdev-haar

Ubuntu's init.d scripts don't work right out of the box, but I was able to
fix that.  corosync-qdevice starts but immediately terminates with an
error, so I don't see the qdevice.

$ sudo pcs property

Cluster Properties:
  cluster-infrastructure: corosync
  cluster-name: hanfsweb
  dc-version: 1.1.18-2b07d5c5a9
  have-watchdog: true
  no-quorum-policy: stop
  stonith-enabled: true
  stonith-timeout: 120s
  stonith-watchdog-timeout: 10



$ sudo pcs quorum status

Quorum information
--
Date: Fri Aug  9 11:34:55 2019
Quorum provider:  corosync_votequorum
Nodes:2
Node ID:  1
Ring ID:  1/464
Quorate:  Yes
Votequorum information
--
Expected votes:   3
Highest expected: 3
Total votes:  2
Quorum:   2 Activity blocked
Flags:WaitForAll

Membership information
--
 Nodeid  VotesQdevice Name
  1  1 NR hanfsweb2.holycross.edu (local)
  2  1 NR hanfsweb4.holycross.edu





'corosync-qdevice' does not generate *ANY* debug output:


qdevice is following corosync logging configuration so for console 
output just set


logging {
to_stderr: no
}

into corosync.conf and restart corosync. Also it looks like message was 
sent to syslog, so more information should be wherever ubuntu stores the 
logs (/var/log/messages, or maybe journalctl may help).


Honza



$ sudo corosync-qdevice -f -d


  But it is trying to use IPC and send messages:

$ sudo strace corosync-qdevice -f -d 2>&1 | tail -15

openat(AT_FDCWD, "/dev/shm/qb-votequorum-event-12248-24916-30-header",
O_RDWR) = 9
ftruncate(9, 8248)  = 0
mmap(NULL, 8248, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0) = 0x7fbf6df67000
openat(AT_FDCWD, "/dev/shm/qb-votequorum-event-12248-24916-30-data",
O_RDWR) = 10
ftruncate(10, 1052672)  = 0
getpid()= 24916
sendto(11, "<30>Aug  9 11:44:56 corosync-qde"..., 102, MSG_NOSIGNAL, NULL,
0) = 102
mmap(NULL, 2105344, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fbf6a4c7000
mmap(0x7fbf6a4c7000, 1052672, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED,
10, 0) = 0x7fbf6a4c7000
mmap(0x7fbf6a5c8000, 1052672, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED,
10, 0) = 0x7fbf6a5c8000
close(10)   = 0
close(9)= 0
sendto(8, "\20", 1, MSG_NOSIGNAL, NULL, 0) = 1
exit_group(1)   = ?
+++ exited with 1 +++



I can't tell the version of corosync-qdevice that Ubuntu 18.04 has, but my
Corosync is 2.4.3.

Thanks,

Rick



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/