Re: [ClusterLabs] question about re-adding a node

2019-04-18 Thread Brian Reichert
On Thu, Apr 18, 2019 at 02:57:58PM -0400, Brian Reichert wrote:
> I'm exploring some simple cluster management, and was exploring the
> workflow of removing, then re-adding a node to a cluster.
> I thought I understood the steps, but things are not working out
> for me as I hoped.

*sigh* I just found this out in another forum.

Instead of this:

  crm_node --force -R

Do this:

  pcs cluster node remove

That allows the re-add to work.

Sorry about the noise...

Brian Reichert  
BSD admin/developer at large
Manage your subscription:

ClusterLabs home:

[ClusterLabs] question about re-adding a node

2019-04-18 Thread Brian Reichert
I'm exploring some simple cluster management, and was exploring the
workflow of removing, then re-adding a node to a cluster.

I thought I understood the steps, but things are not working out
for me as I hoped.

Hopefully, someone here can provide some insight about what steps I've
skipped might be...

First, add a node:

  [root@node1 ~]# pcs cluster node add --start
  Disabling SBD service... sbd disabled
  Sending remote node configuration files to '' successful distribution of the file 'pacemaker_remote
  authkey' Corosync updated Corosync updated
  Setting up corosync... Succeeded Starting Cluster (corosync)...
  Starting Cluster (pacemaker)...
  Synchronizing pcsd certificates on nodes Success
  Restarting pcsd on the nodes in order to reload the certificates... Success

Remove that same node, seemingly without error:

  [root@node1 ~]# pcs --force cluster stop Stopping Cluster (pacemaker)... Stopping Cluster (corosync)...
  [root@node1 ~]# crm_node --force -R
  [root@node1 ~]# echo $?

On a lark, try to re-add it:

  [root@node1 ~]# pcs cluster node add --start
  Error: Unable to add '' to cluster: node is
  already in a cluster

Not what I was expecting; what should I be doing differently?

Brian Reichert  
BSD admin/developer at large
Manage your subscription:

ClusterLabs home:

Re: [ClusterLabs] hacluster password stored in the clear

2019-03-28 Thread Brian Reichert
On Wed, Mar 27, 2019 at 09:15:21PM +0100, Jan Pokorn?? wrote:
> If you intend to stick with pcs for managerial tasks around the
> cluster, handling the corosync messaging in isolation wouldn't
> win you anything.  It works in the opposite direction, once you
> have undertaken the authentication step for pcs peers network,
> it can then be leveraged as a springboard also to get your initial
> cluster (comprising also configuring the corosync layer for you,
> authkey generation and distribution being part of it) set up.

Thanks for the feedback, and confirming what I've found.

> -- 
> Hope this helps,
> Jan (Poki)

> ___
> Manage your subscription:
> ClusterLabs home:

Brian Reichert  
BSD admin/developer at large
Manage your subscription:

ClusterLabs home:

[ClusterLabs] hacluster password stored in the clear

2019-03-27 Thread Brian Reichert
I'm putting together some tooling to automate provisioning a cluster.

All examples I've seen so far call out using 'pcs auth' as a precursor
step, but to use this pragmatically, I need to pass around the
password in the clear.

As I already have the means of using SSH identities of moving data
across the hosts in question, is there perhaps some means of
pre-populating /etc/corosync/authkey, and copying that around?

I'm looking at the manpage for corosync-keygen(8), but it's not
clear to me that I'm on the right path here...

Brian Reichert  
BSD admin/developer at large
Manage your subscription:

ClusterLabs home:

[ClusterLabs] Why do clusters have a name?

2019-03-26 Thread Brian Reichert
This will sound like a dumb question:

The manpage for pcs(8) implies that to set up a cluster, one needs
to provide a name.

Why do clusters have names?

Is there a use case wherein there would be multiple clusters visible
in an administrative UI, such that they'd need to be differentiated?

Brian Reichert  
BSD admin/developer at large
Manage your subscription:

ClusterLabs home:

Re: [ClusterLabs] recommendations for corosync totem timeout for CentOS 7 + VMware?

2019-03-22 Thread Brian Reichert
On Fri, Mar 22, 2019 at 11:07:55AM +0100, Jan Pokorn?? wrote:
> On 21/03/19 12:21 -0400, Brian Reichert wrote:
> > I've followed several tutorials about setting up a simple three-node
> > cluster, with no resources (yet), under CentOS 7.
> > 
> > I've discovered the cluster won't restart upon rebooting a node.
> > 
> > The other two nodes, however, do claim the cluster is up, as shown
> > with 'pcs status cluster'.
> Please excuse the lack of understanding perhaps owing to the Friday
> mental power phenomena, but:
> 1. what do you mean with "cluster restart"?
>local instance of cluster services being started anew once
>the node at hand finishes booting?

I mean that when I reboot node1, node1 reports the cluster is up,
via 'pcs cluster status'.

> 2. why shall a single malfunctioned node (out of three) irrefutably
>result in dismounting of otherwise healthy cluster?
>    (if that's indeed what you presume)

I don't presume that.

> -- 
> Jan (Poki)

Brian Reichert  
BSD admin/developer at large
Manage your subscription:

ClusterLabs home:

Re: [ClusterLabs] recommendations for corosync totem timeout for CentOS 7 + VMware?

2019-03-22 Thread Brian Reichert
On Fri, Mar 22, 2019 at 08:57:20AM +0100, Jan Friesse wrote:
> >- If I manually set 'totem.token' to a higher value, am I responsible
> >   for tracking the number of nodes in the cluster, to keep in
> >   alignment with what Red Hat's page says?
> Nope. I've tried to explain what is really happening in the manpage 
> corosync.conf(5). totem.token and totem.token_coefficient are used in 
> the following formula:

I do see this under token_coefficient, thanks.

> Corosync used runtime.config.token.

Cool; thanks.  Bumping up totem.token to 2000 got me over this hump.

> >- Under these conditions, when corosync exits, why does it do so
> >   with a zero status? It seems to me that if it exited at all,
> That's a good question. How reproducible is the issue? Corosync 
> shouldn't "exit" with zero status.

If I leave totem.token set to default, %100 in my case.

I stand corrected; yesterday, it was %100.  Today, I cannot reproduce
this at all, even with reverting to the defaults.

Here is a snippet of output from yesterday's experiments; this is
based on a typescript capture file, so I apologize for the ANSI
screen codes.

- by default, systemd doesn't report full log lines.

- by default, CentOS's config of systemd doesn't persist journaled
  logs, so I can't directly review yesterday's efforts.

- and, it looks like I misinterpreted the 'exited' message; corosync
  was enabled and running, but the 'Process' line doesn't report
  on the 'corosync' process, but some systemd utility.

(Let me count the ways I'm coming to dislike systemd...)

I was able to recover logs from /var/log/messages, but other than
the 'Consider token timeout increase' message, it looks hunky-dory.

With what I've since learned; 

- I cannot explain why I can't reproduce the symptoms, even with
  reverting to the defaults.

- And without being able to reproduce, I can't pursue why 'pcs
  status cluster' was actually failing for me. :/

So, I appreciate your attention to this message, and I guess I'm
off to further explore all of this.

  C]0;root@node1:~^G[root@node1 ~]# systemctl status corosync.service
  ESC[1;32m???ESC[0m corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor
preset: disabled)
 Active: ESC[1;32mactive (running)ESC[0m since Thu 2019-03-21 14:26:56
UTC; 1min 35s ago
   Docs: man:corosync
Process: 5474 ExecStart=/usr/share/corosync/corosync start (code=exited,
   Main PID: 5490 (corosync)
 CGroup: /system.slice/corosync.service
   ??5490 corosync

>   Honza

Brian Reichert  
BSD admin/developer at large
Manage your subscription:

ClusterLabs home: