[ClusterLabs] PCS cluster auth fails
Hi , I am using a two node corosync cluster(node1 and node2). When I do pcs cluster auth during corosync-qdevice configuration to the qdevice node(qnode), I am getting the below error, $ sudo pcs cluster auth qnode -u hacluster -p --debug Running: /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb auth Environment: GEM_HOME=/usr/lib/pcsd/vendor/bundle/ruby HISTSIZE=1000 HOME=/root HOSTNAME=node1 LANG=en_US.UTF-8 LC_ALL=C LOGNAME=root LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36: MAIL=/var/spool/mail/admin PATH=/sbin:/bin:/usr/sbin:/usr/bin PCSD_DEBUG=true PCSD_NETWORK_TIMEOUT=60 PS1=[\u@\h-$node_name \W]\$ SHELL=/sbin/nologin SUDO_COMMAND=/sbin/pcs cluster auth qnode -u hacluster -p *** --debug SUDO_GID=5007 SUDO_UID=5008 SUDO_USER=admin TERM=xterm USER=root USERNAME=root --Debug Input Start-- {"username": "hacluster", "local": false, "nodes": {"qnode": null}, "password": "***", "force": false} --Debug Input End-- Finished running: /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb auth Return value: 0 --Debug Stdout Start-- { "status": "ok", "data": { "auth_responses": { "qnode": { "status": "ok", "token": "66c1020d-1089-4d8b-beab-a8f76e2c8b89" } }, "sync_successful": true, "sync_nodes_err": [ "node2" ], "sync_responses": { "node2": { "status": "error" }, "node1": { "status": "ok", "result": { "tokens": "accepted" } } } }, "log": [ "I, [2020-02-07T17:11:34.011890 #30323] INFO -- : PCSD Debugging enabled\n", "D, [2020-02-07T17:11:34.012401 #30323] DEBUG -- : Did not detect RHEL 6\n", "D, [2020-02-07T17:11:34.012446 #30323] DEBUG -- : Detected systemd is in use\n", "I, [2020-02-07T17:11:34.160540 #30323] INFO -- : Running: /usr/sbin/corosync-cmapctl totem.cluster_name\n", "I, [2020-02-07T17:11:34.160707 #30323] INFO -- : CIB USER: hacluster, groups: \n", "D, [2020-02-07T17:11:34.176123 #30323] DEBUG -- : [\"totem.cluster_name (str) = HBASE\\n\"]\n", "D, [2020-02-07T17:11:34.176307 #30323] DEBUG -- : []\n", "D, [2020-02-07T17:11:34.176357 #30323] DEBUG -- : Duration: 0.015370265s\n", "I, [2020-02-07T17:11:34.176443 #30323] INFO -- : Return Value: 0\n", "I, [2020-02-07T17:11:34.407886 #30323] INFO -- : Running: /usr/sbin/pcs status nodes corosync\n", "I, [2020-02-07T17:11:34.407989 #30323] INFO -- : CIB USER: hacluster, groups: \n", "D, [2020-02-07T17:11:34.775751 #30323] DEBUG -- : [\"Corosync Nodes:\\n\", \" Online: node2 node1\\n\", \" Offline:\\n\"]\n", "D, [2020-02-07T17:11:34.775909 #30323] DEBUG -- : []\n", "D, [2020-02-07T17:11:34.775960 #30323] DEBUG -- : Duration: 0.367740571s\n", "I, [2020-02-07T17:11:34.776056 #30323] INFO -- : Return Value: 0\n", "I, [2020-02-07T17:11:34.776517 #30323] INFO -- : Sending config 'tokens' version 5 40712570bb7e5718afce943a151b259f04e7c080 to nodes: node2, node1\n", "I, [2020-02-07T17:11:34.777079 #30323] INFO -- : SRWT Node: node2 Request: set_configs\n", "I, [2020-02-07T17:11:34.777695 #30323] INFO -- : SRWT Node: node1 Request: set_configs\n", "I, [2020-02-07T17:11:34.865021 #30323] INFO -- : Sending config response from node2: {\"status\"=>\"error\"}\n", "I, [2020-02-07T17:11:34.865141 #30323] INFO -- : Sending config response from node1: {\"status\"=>\"ok\", \"result\"=>{\"tokens\"=>\"accepted\"}}\n" ] } --Debug Stdout End-- --Debug Stderr Start-- --Debug Stderr En
Re: [ClusterLabs] Why Do Nodes Leave the Cluster?
On February 6, 2020 7:35:53 PM GMT+02:00, Eric Robinson wrote: >Hi Nikolov -- > >> Defaults are 1s token, 1.2s consensus which is too small. >> In Suse, token is 10s, while consensus is 1.2 * token -> 12s. >> With these settings, cluster will not react for 22s. >> >> I think it's a good start for your cluster . >> Don't forget to put the cluster in maintenance (pcs property set >> maintenance-mode=true) before restarting the stack , or even better >- get >> some downtime. >> >> You can use the following article to run a simulation before removing >the >> maintenance: >> https://www.suse.com/support/kb/doc/?id=7022764 >> > > >Thanks for the suggestions. Any thoughts on timeouts for DRBD? > >--Eric > >Disclaimer : This email and any files transmitted with it are >confidential and intended solely for intended recipients. If you are >not the named addressee you should not disseminate, distribute, copy or >alter this email. Any views or opinions presented in this email are >solely those of the author and might not represent those of Physician >Select Management. Warning: Although Physician Select Management has >taken reasonable precautions to ensure no viruses are present in this >email, the company cannot accept responsibility for any loss or damage >arising from the use of this email or attachments. Hi Eric, The timeouts can be treated as 'how much time to wait before taking any action'. The workload is not very important (HANA is something different). You can try with 10s (token) , 12s (consensus) and if needed you can adjust. Warning: Use a 3 node cluster or at least 2 drbd nodes + qdisk. The 2 node cluster is vulnerable to split brain, especially when one of the nodes is syncing (for example after a patching) and the source is fenced/lost/disconnected. It's very hard to extract data from a semi-synced drbd. Also, if you need guidance for the SELINUX, I can point you to my guide in the centos forum. Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Why Do Nodes Leave the Cluster?
Hi Nikolov -- > Defaults are 1s token, 1.2s consensus which is too small. > In Suse, token is 10s, while consensus is 1.2 * token -> 12s. > With these settings, cluster will not react for 22s. > > I think it's a good start for your cluster . > Don't forget to put the cluster in maintenance (pcs property set > maintenance-mode=true) before restarting the stack , or even better - get > some downtime. > > You can use the following article to run a simulation before removing the > maintenance: > https://www.suse.com/support/kb/doc/?id=7022764 > Thanks for the suggestions. Any thoughts on timeouts for DRBD? --Eric Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Why Do Nodes Leave the Cluster?
> > > > I've done that with all my other clusters, but these two servers are > > in Azure, so the network is out of our control. > > Is a normal cluster supported to use corosync over Internet? I'm not sure > (because of the delays and possible packet losses). > > As with most things, the main concern is latency and loss. The latency between these two nodes is < 1ms, and loss is always 0%. --Eric Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/