Re: [ClusterLabs] Antw: [EXT] Cannot add a node with pcs

2022-07-13 Thread Piotr Szafarczyk

Hi Ulrich,

Thank you. I am perfectly aware that operating without stonith is not a 
good idea :). I am sure I will add it. But first I need to understand 
the current state. I am afraid of introducing something new before I fix 
the current problem.


Best regards,
Piotr

On 13.07.2022 08:00, Ulrich Windl wrote:

Piotr Szafarczyk  schrieb am 12.07.2022 um 12:34 in

Nachricht <38ccc24a-7b01-561c-20f8-ec2273a18...@netexpert.pl>:

Hi,

I used to have a working cluster with 3 nodes (and stonith disabled).

THE SLES guide says:
Important: No Support Without STONITH
You must have a node fencing mechanism for your cluster.
The global cluster options stonith-enabled and startup-fencing must be
set to true . When you change them, you lose support.

Maybe that helps.


After an unexpected restart of one node, the cluster split. The node #2
started to see the others as unclean. Nodes 1 and 2 were cooperating
with each other, showing #2 as offline. There were no network connection
problems.

I removed #2 (operating from #1) with
pcs cluster node remove n2

I verified that it had removed all configuration from #2, both for
corosync and for pacemaker. The cluster looks like working correctly
with two nodes (and no traces of #2).

Now I am trying to add the third node back.
pcs cluster node add n2
Disabling SBD service...
n2: sbd disabled
Sending 'corosync authkey', 'pacemaker authkey' to 'n2'
n2: successful distribution of the file 'corosync authkey'
n2: successful distribution of the file 'pacemaker authkey'
Sending updated corosync.conf to nodes...
n3: Succeeded
n2: Succeeded
n1: Succeeded
n3: Corosync configuration reloaded

I am able to start #2 operating from #1

pcs cluster pcsd-status
n2: Online
n3: Online
n1: Online

pcs cluster enable n2
pcs cluster start n2

I can see that corosync's configuration has been updated, but
pacemaker's not.

_Checking from #1:_

pcs config
Cluster Name: n
Corosync Nodes:
   n1 n3 n2
Pacemaker Nodes:
   n1 n3
[...]

pcs status
* 2 nodes configured
Node List:
* Online: [ n1 n3 ]
[...]

pcs cluster cib scope=nodes





_#2 is seeing the state differently:_

pcs config
Cluster Name: n
Corosync Nodes:
   n1 n3 n2
Pacemaker Nodes:
   n1 n2 n3

pcs status
* 3 nodes configured
Node List:
* Online: [ n2 ]
* OFFLINE: [ n1 n3 ]
Full List of Resources:
* No resources
[...]
(there are resources configured on #1 and #3)

pcs cluster cib scope=nodes






Help me diagnose it please. Where should I look for the problem? (I have
already tried a few things more - I see nothing helpful in log files,
pcs --debug shows nothing suspicious, tried even editing the CIB manually)

Best regards,

Piotr Szafarczyk




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Cannot add a node with pcs

2022-07-12 Thread Ulrich Windl
>>> Piotr Szafarczyk  schrieb am 12.07.2022 um 12:34 in
Nachricht <38ccc24a-7b01-561c-20f8-ec2273a18...@netexpert.pl>:
> Hi,
> 
> I used to have a working cluster with 3 nodes (and stonith disabled). 

THE SLES guide says:
Important: No Support Without STONITH
You must have a node fencing mechanism for your cluster.
The global cluster options stonith-enabled and startup-fencing must be
set to true . When you change them, you lose support.

Maybe that helps.

> After an unexpected restart of one node, the cluster split. The node #2 
> started to see the others as unclean. Nodes 1 and 2 were cooperating 
> with each other, showing #2 as offline. There were no network connection 
> problems.
> 
> I removed #2 (operating from #1) with
> pcs cluster node remove n2
> 
> I verified that it had removed all configuration from #2, both for 
> corosync and for pacemaker. The cluster looks like working correctly 
> with two nodes (and no traces of #2).
> 
> Now I am trying to add the third node back.
> pcs cluster node add n2
> Disabling SBD service...
> n2: sbd disabled
> Sending 'corosync authkey', 'pacemaker authkey' to 'n2'
> n2: successful distribution of the file 'corosync authkey'
> n2: successful distribution of the file 'pacemaker authkey'
> Sending updated corosync.conf to nodes...
> n3: Succeeded
> n2: Succeeded
> n1: Succeeded
> n3: Corosync configuration reloaded
> 
> I am able to start #2 operating from #1
> 
> pcs cluster pcsd-status
>n2: Online
>n3: Online
>n1: Online
> 
> pcs cluster enable n2
> pcs cluster start n2
> 
> I can see that corosync's configuration has been updated, but 
> pacemaker's not.
> 
> _Checking from #1:_
> 
> pcs config
> Cluster Name: n
> Corosync Nodes:
>   n1 n3 n2
> Pacemaker Nodes:
>   n1 n3
> [...]
> 
> pcs status
>* 2 nodes configured
> Node List:
>* Online: [ n1 n3 ]
> [...]
> 
> pcs cluster cib scope=nodes
> 
>
>
> 
> 
> _#2 is seeing the state differently:_
> 
> pcs config
> Cluster Name: n
> Corosync Nodes:
>   n1 n3 n2
> Pacemaker Nodes:
>   n1 n2 n3
> 
> pcs status
>* 3 nodes configured
> Node List:
>* Online: [ n2 ]
>* OFFLINE: [ n1 n3 ]
> Full List of Resources:
>* No resources
> [...]
> (there are resources configured on #1 and #3)
> 
> pcs cluster cib scope=nodes
> 
>
>
>
> 
> 
> Help me diagnose it please. Where should I look for the problem? (I have 
> already tried a few things more - I see nothing helpful in log files, 
> pcs --debug shows nothing suspicious, tried even editing the CIB manually)
> 
> Best regards,
> 
> Piotr Szafarczyk




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/