Re: [ClusterLabs] question about re-adding a node
On Thu, Apr 18, 2019 at 02:57:58PM -0400, Brian Reichert wrote: > I'm exploring some simple cluster management, and was exploring the > workflow of removing, then re-adding a node to a cluster. > > I thought I understood the steps, but things are not working out > for me as I hoped. *sigh* I just found this out in another forum. Instead of this: crm_node --force -R node3.example.com Do this: pcs cluster node remove node3.example.com That allows the re-add to work. Sorry about the noise... -- Brian Reichert BSD admin/developer at large ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] question about re-adding a node
I'm exploring some simple cluster management, and was exploring the workflow of removing, then re-adding a node to a cluster. I thought I understood the steps, but things are not working out for me as I hoped. Hopefully, someone here can provide some insight about what steps I've skipped might be... First, add a node: [root@node1 ~]# pcs cluster node add --start node3.example.com Disabling SBD service... node3.example.com: sbd disabled Sending remote node configuration files to 'node3.example.com' node3.example.com: successful distribution of the file 'pacemaker_remote authkey' node1.example.com: Corosync updated node2.example.com: Corosync updated Setting up corosync... node3.example.com: Succeeded node3.example.com: Starting Cluster (corosync)... Starting Cluster (pacemaker)... Synchronizing pcsd certificates on nodes node3.example.com... node3.example.com: Success Restarting pcsd on the nodes in order to reload the certificates... node3.example.com: Success Remove that same node, seemingly without error: [root@node1 ~]# pcs --force cluster stop node3.example.com node3.example.com: Stopping Cluster (pacemaker)... node3.example.com: Stopping Cluster (corosync)... [root@node1 ~]# crm_node --force -R node3.example.com [root@node1 ~]# echo $? 0 On a lark, try to re-add it: [root@node1 ~]# pcs cluster node add --start node3.example.com Error: Unable to add 'node3.example.com' to cluster: node is already in a cluster Not what I was expecting; what should I be doing differently? -- Brian Reichert BSD admin/developer at large ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] hacluster password stored in the clear
On Wed, Mar 27, 2019 at 09:15:21PM +0100, Jan Pokorn?? wrote: > If you intend to stick with pcs for managerial tasks around the > cluster, handling the corosync messaging in isolation wouldn't > win you anything. It works in the opposite direction, once you > have undertaken the authentication step for pcs peers network, > it can then be leveraged as a springboard also to get your initial > cluster (comprising also configuring the corosync layer for you, > authkey generation and distribution being part of it) set up. Thanks for the feedback, and confirming what I've found. > -- > Hope this helps, > Jan (Poki) > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Brian Reichert BSD admin/developer at large ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] hacluster password stored in the clear
I'm putting together some tooling to automate provisioning a cluster. All examples I've seen so far call out using 'pcs auth' as a precursor step, but to use this pragmatically, I need to pass around the password in the clear. As I already have the means of using SSH identities of moving data across the hosts in question, is there perhaps some means of pre-populating /etc/corosync/authkey, and copying that around? I'm looking at the manpage for corosync-keygen(8), but it's not clear to me that I'm on the right path here... -- Brian Reichert BSD admin/developer at large ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Why do clusters have a name?
This will sound like a dumb question: The manpage for pcs(8) implies that to set up a cluster, one needs to provide a name. Why do clusters have names? Is there a use case wherein there would be multiple clusters visible in an administrative UI, such that they'd need to be differentiated? -- Brian Reichert BSD admin/developer at large ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] recommendations for corosync totem timeout for CentOS 7 + VMware?
On Fri, Mar 22, 2019 at 11:07:55AM +0100, Jan Pokorn?? wrote: > On 21/03/19 12:21 -0400, Brian Reichert wrote: > > I've followed several tutorials about setting up a simple three-node > > cluster, with no resources (yet), under CentOS 7. > > > > I've discovered the cluster won't restart upon rebooting a node. > > > > The other two nodes, however, do claim the cluster is up, as shown > > with 'pcs status cluster'. > > Please excuse the lack of understanding perhaps owing to the Friday > mental power phenomena, but: > > 1. what do you mean with "cluster restart"? >local instance of cluster services being started anew once >the node at hand finishes booting? I mean that when I reboot node1, node1 reports the cluster is up, via 'pcs cluster status'. > 2. why shall a single malfunctioned node (out of three) irrefutably >result in dismounting of otherwise healthy cluster? > (if that's indeed what you presume) I don't presume that. > -- > Jan (Poki) -- Brian Reichert BSD admin/developer at large ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] recommendations for corosync totem timeout for CentOS 7 + VMware?
On Fri, Mar 22, 2019 at 08:57:20AM +0100, Jan Friesse wrote: > >- If I manually set 'totem.token' to a higher value, am I responsible > > for tracking the number of nodes in the cluster, to keep in > > alignment with what Red Hat's page says? > > Nope. I've tried to explain what is really happening in the manpage > corosync.conf(5). totem.token and totem.token_coefficient are used in > the following formula: I do see this under token_coefficient, thanks. > Corosync used runtime.config.token. Cool; thanks. Bumping up totem.token to 2000 got me over this hump. > >- Under these conditions, when corosync exits, why does it do so > > with a zero status? It seems to me that if it exited at all, > > That's a good question. How reproducible is the issue? Corosync > shouldn't "exit" with zero status. If I leave totem.token set to default, %100 in my case. I stand corrected; yesterday, it was %100. Today, I cannot reproduce this at all, even with reverting to the defaults. Here is a snippet of output from yesterday's experiments; this is based on a typescript capture file, so I apologize for the ANSI screen codes. - by default, systemd doesn't report full log lines. - by default, CentOS's config of systemd doesn't persist journaled logs, so I can't directly review yesterday's efforts. - and, it looks like I misinterpreted the 'exited' message; corosync was enabled and running, but the 'Process' line doesn't report on the 'corosync' process, but some systemd utility. (Let me count the ways I'm coming to dislike systemd...) I was able to recover logs from /var/log/messages, but other than the 'Consider token timeout increase' message, it looks hunky-dory. With what I've since learned; - I cannot explain why I can't reproduce the symptoms, even with reverting to the defaults. - And without being able to reproduce, I can't pursue why 'pcs status cluster' was actually failing for me. :/ So, I appreciate your attention to this message, and I guess I'm off to further explore all of this. C]0;root@node1:~^G[root@node1 ~]# systemctl status corosync.service ESC[1;32m???ESC[0m corosync.service - Corosync Cluster Engine Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor preset: disabled) Active: ESC[1;32mactive (running)ESC[0m since Thu 2019-03-21 14:26:56 UTC; 1min 35s ago Docs: man:corosync man:corosync.conf man:corosync_overview Process: 5474 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS) Main PID: 5490 (corosync) CGroup: /system.slice/corosync.service ??5490 corosync > Honza -- Brian Reichert BSD admin/developer at large ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/