Re: [ClusterLabs] question about re-adding a node

2019-04-18 Thread Brian Reichert
On Thu, Apr 18, 2019 at 02:57:58PM -0400, Brian Reichert wrote:
> I'm exploring some simple cluster management, and was exploring the
> workflow of removing, then re-adding a node to a cluster.
> 
> I thought I understood the steps, but things are not working out
> for me as I hoped.

*sigh* I just found this out in another forum.

Instead of this:

  crm_node --force -R node3.example.com

Do this:

  pcs cluster node remove node3.example.com

That allows the re-add to work.

Sorry about the noise...

-- 
Brian Reichert  
BSD admin/developer at large
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] question about re-adding a node

2019-04-18 Thread Brian Reichert
I'm exploring some simple cluster management, and was exploring the
workflow of removing, then re-adding a node to a cluster.

I thought I understood the steps, but things are not working out
for me as I hoped.

Hopefully, someone here can provide some insight about what steps I've
skipped might be...

First, add a node:

  [root@node1 ~]# pcs cluster node add --start node3.example.com
  Disabling SBD service...
  node3.example.com: sbd disabled
  Sending remote node configuration files to 'node3.example.com'
  node3.example.com: successful distribution of the file 'pacemaker_remote
  authkey'
  node1.example.com: Corosync updated
  node2.example.com: Corosync updated
  Setting up corosync...
  node3.example.com: Succeeded
  node3.example.com: Starting Cluster (corosync)...
  Starting Cluster (pacemaker)...
  Synchronizing pcsd certificates on nodes node3.example.com...
  node3.example.com: Success
  Restarting pcsd on the nodes in order to reload the certificates...
  node3.example.com: Success

Remove that same node, seemingly without error:

  [root@node1 ~]# pcs --force cluster stop node3.example.com
  node3.example.com: Stopping Cluster (pacemaker)...
  node3.example.com: Stopping Cluster (corosync)...
  [root@node1 ~]# crm_node --force -R node3.example.com
  [root@node1 ~]# echo $?
  0

On a lark, try to re-add it:

  [root@node1 ~]# pcs cluster node add --start node3.example.com
  Error: Unable to add 'node3.example.com' to cluster: node is
  already in a cluster

Not what I was expecting; what should I be doing differently?

-- 
Brian Reichert  
BSD admin/developer at large
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] hacluster password stored in the clear

2019-03-28 Thread Brian Reichert
On Wed, Mar 27, 2019 at 09:15:21PM +0100, Jan Pokorn?? wrote:
> If you intend to stick with pcs for managerial tasks around the
> cluster, handling the corosync messaging in isolation wouldn't
> win you anything.  It works in the opposite direction, once you
> have undertaken the authentication step for pcs peers network,
> it can then be leveraged as a springboard also to get your initial
> cluster (comprising also configuring the corosync layer for you,
> authkey generation and distribution being part of it) set up.

Thanks for the feedback, and confirming what I've found.

> -- 
> Hope this helps,
> Jan (Poki)



> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/


-- 
Brian Reichert  
BSD admin/developer at large
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] hacluster password stored in the clear

2019-03-27 Thread Brian Reichert
I'm putting together some tooling to automate provisioning a cluster.

All examples I've seen so far call out using 'pcs auth' as a precursor
step, but to use this pragmatically, I need to pass around the
password in the clear.

As I already have the means of using SSH identities of moving data
across the hosts in question, is there perhaps some means of
pre-populating /etc/corosync/authkey, and copying that around?

I'm looking at the manpage for corosync-keygen(8), but it's not
clear to me that I'm on the right path here...

-- 
Brian Reichert  
BSD admin/developer at large
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Why do clusters have a name?

2019-03-26 Thread Brian Reichert
This will sound like a dumb question:

The manpage for pcs(8) implies that to set up a cluster, one needs
to provide a name.

Why do clusters have names?

Is there a use case wherein there would be multiple clusters visible
in an administrative UI, such that they'd need to be differentiated?

-- 
Brian Reichert  
BSD admin/developer at large
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] recommendations for corosync totem timeout for CentOS 7 + VMware?

2019-03-22 Thread Brian Reichert
On Fri, Mar 22, 2019 at 11:07:55AM +0100, Jan Pokorn?? wrote:
> On 21/03/19 12:21 -0400, Brian Reichert wrote:
> > I've followed several tutorials about setting up a simple three-node
> > cluster, with no resources (yet), under CentOS 7.
> > 
> > I've discovered the cluster won't restart upon rebooting a node.
> > 
> > The other two nodes, however, do claim the cluster is up, as shown
> > with 'pcs status cluster'.
> 
> Please excuse the lack of understanding perhaps owing to the Friday
> mental power phenomena, but:
> 
> 1. what do you mean with "cluster restart"?
>local instance of cluster services being started anew once
>the node at hand finishes booting?

I mean that when I reboot node1, node1 reports the cluster is up,
via 'pcs cluster status'.

> 2. why shall a single malfunctioned node (out of three) irrefutably
>result in dismounting of otherwise healthy cluster?
>    (if that's indeed what you presume)

I don't presume that.

> -- 
> Jan (Poki)

-- 
Brian Reichert  
BSD admin/developer at large
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] recommendations for corosync totem timeout for CentOS 7 + VMware?

2019-03-22 Thread Brian Reichert
On Fri, Mar 22, 2019 at 08:57:20AM +0100, Jan Friesse wrote:
> >- If I manually set 'totem.token' to a higher value, am I responsible
> >   for tracking the number of nodes in the cluster, to keep in
> >   alignment with what Red Hat's page says?
> 
> Nope. I've tried to explain what is really happening in the manpage 
> corosync.conf(5). totem.token and totem.token_coefficient are used in 
> the following formula:

I do see this under token_coefficient, thanks.

> Corosync used runtime.config.token.

Cool; thanks.  Bumping up totem.token to 2000 got me over this hump.

> >- Under these conditions, when corosync exits, why does it do so
> >   with a zero status? It seems to me that if it exited at all,
> 
> That's a good question. How reproducible is the issue? Corosync 
> shouldn't "exit" with zero status.

If I leave totem.token set to default, %100 in my case.

I stand corrected; yesterday, it was %100.  Today, I cannot reproduce
this at all, even with reverting to the defaults.

Here is a snippet of output from yesterday's experiments; this is
based on a typescript capture file, so I apologize for the ANSI
screen codes.

- by default, systemd doesn't report full log lines.

- by default, CentOS's config of systemd doesn't persist journaled
  logs, so I can't directly review yesterday's efforts.

- and, it looks like I misinterpreted the 'exited' message; corosync
  was enabled and running, but the 'Process' line doesn't report
  on the 'corosync' process, but some systemd utility.

(Let me count the ways I'm coming to dislike systemd...)

I was able to recover logs from /var/log/messages, but other than
the 'Consider token timeout increase' message, it looks hunky-dory.

With what I've since learned; 

- I cannot explain why I can't reproduce the symptoms, even with
  reverting to the defaults.

- And without being able to reproduce, I can't pursue why 'pcs
  status cluster' was actually failing for me. :/

So, I appreciate your attention to this message, and I guess I'm
off to further explore all of this.

  C]0;root@node1:~^G[root@node1 ~]# systemctl status corosync.service
  ESC[1;32m???ESC[0m corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor
preset: disabled)
 Active: ESC[1;32mactive (running)ESC[0m since Thu 2019-03-21 14:26:56
UTC; 1min 35s ago
   Docs: man:corosync
 man:corosync.conf
 man:corosync_overview
Process: 5474 ExecStart=/usr/share/corosync/corosync start (code=exited,
status=0/SUCCESS)
   Main PID: 5490 (corosync)
 CGroup: /system.slice/corosync.service
   ??5490 corosync


>   Honza

-- 
Brian Reichert  
BSD admin/developer at large
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/