date:20200206

[ClusterLabs] PCS cluster auth fails

2020-02-06 Thread Somanath Jeeva

Hi ,

I am using a two node corosync cluster(node1 and node2). When I do pcs cluster 
auth during corosync-qdevice configuration to the qdevice node(qnode), I am 
getting the below error,

$ sudo pcs cluster auth qnode -u hacluster -p  --debug
Running: /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb auth
Environment:
  GEM_HOME=/usr/lib/pcsd/vendor/bundle/ruby
  HISTSIZE=1000
  HOME=/root
  HOSTNAME=node1
  LANG=en_US.UTF-8
  LC_ALL=C
  LOGNAME=root
  
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
  MAIL=/var/spool/mail/admin
  PATH=/sbin:/bin:/usr/sbin:/usr/bin
  PCSD_DEBUG=true
  PCSD_NETWORK_TIMEOUT=60
  PS1=[\u@\h-$node_name \W]\$
  SHELL=/sbin/nologin
  SUDO_COMMAND=/sbin/pcs cluster auth qnode -u hacluster -p *** --debug
  SUDO_GID=5007
  SUDO_UID=5008
  SUDO_USER=admin
  TERM=xterm
  USER=root
  USERNAME=root
--Debug Input Start--
{"username": "hacluster", "local": false, "nodes": {"qnode": null}, "password": 
"***", "force": false}
--Debug Input End--

Finished running: /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb auth
Return value: 0
--Debug Stdout Start--
{
  "status": "ok",
  "data": {
"auth_responses": {
  "qnode": {
"status": "ok",
"token": "66c1020d-1089-4d8b-beab-a8f76e2c8b89"
  }
},
"sync_successful": true,
"sync_nodes_err": [
  "node2"
],
"sync_responses": {
  "node2": {
"status": "error"
  },
  "node1": {
"status": "ok",
"result": {
  "tokens": "accepted"
}
  }
}
  },
  "log": [
"I, [2020-02-07T17:11:34.011890 #30323]  INFO -- : PCSD Debugging 
enabled\n",
"D, [2020-02-07T17:11:34.012401 #30323] DEBUG -- : Did not detect RHEL 6\n",
"D, [2020-02-07T17:11:34.012446 #30323] DEBUG -- : Detected systemd is in 
use\n",
"I, [2020-02-07T17:11:34.160540 #30323]  INFO -- : Running: 
/usr/sbin/corosync-cmapctl totem.cluster_name\n",
"I, [2020-02-07T17:11:34.160707 #30323]  INFO -- : CIB USER: hacluster, 
groups: \n",
"D, [2020-02-07T17:11:34.176123 #30323] DEBUG -- : [\"totem.cluster_name 
(str) = HBASE\\n\"]\n",
"D, [2020-02-07T17:11:34.176307 #30323] DEBUG -- : []\n",
"D, [2020-02-07T17:11:34.176357 #30323] DEBUG -- : Duration: 
0.015370265s\n",
"I, [2020-02-07T17:11:34.176443 #30323]  INFO -- : Return Value: 0\n",
"I, [2020-02-07T17:11:34.407886 #30323]  INFO -- : Running: /usr/sbin/pcs 
status nodes corosync\n",
"I, [2020-02-07T17:11:34.407989 #30323]  INFO -- : CIB USER: hacluster, 
groups: \n",
"D, [2020-02-07T17:11:34.775751 #30323] DEBUG -- : [\"Corosync Nodes:\\n\", 
\" Online: node2 node1\\n\", \" Offline:\\n\"]\n",
"D, [2020-02-07T17:11:34.775909 #30323] DEBUG -- : []\n",
"D, [2020-02-07T17:11:34.775960 #30323] DEBUG -- : Duration: 
0.367740571s\n",
"I, [2020-02-07T17:11:34.776056 #30323]  INFO -- : Return Value: 0\n",
"I, [2020-02-07T17:11:34.776517 #30323]  INFO -- : Sending config 'tokens' 
version 5 40712570bb7e5718afce943a151b259f04e7c080 to nodes: node2, node1\n",
"I, [2020-02-07T17:11:34.777079 #30323]  INFO -- : SRWT Node: node2 
Request: set_configs\n",
"I, [2020-02-07T17:11:34.777695 #30323]  INFO -- : SRWT Node: node1 
Request: set_configs\n",
"I, [2020-02-07T17:11:34.865021 #30323]  INFO -- : Sending config response 
from node2: {\"status\"=>\"error\"}\n",
"I, [2020-02-07T17:11:34.865141 #30323]  INFO -- : Sending config response 
from node1: {\"status\"=>\"ok\", \"result\"=>{\"tokens\"=>\"accepted\"}}\n"
  ]
}

--Debug Stdout End--
--Debug Stderr Start--

--Debug Stderr En

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

2020-02-06 Thread Strahil Nikolov

On February 6, 2020 7:35:53 PM GMT+02:00, Eric Robinson 
 wrote:
>Hi Nikolov --
>
>> Defaults are 1s  token,  1.2s  consensus which is too small.
>> In Suse, token is 10s, while consensus  is 1.2 * token -> 12s.
>> With these settings, cluster  will not react   for 22s.
>>
>> I think it's a good start for your cluster .
>> Don't forget to put  the cluster  in maintenance (pcs property set
>> maintenance-mode=true) before restarting the stack ,  or  even better
>- get
>> some downtime.
>>
>> You can use the following article to run a simulation before removing
>the
>> maintenance:
>> https://www.suse.com/support/kb/doc/?id=7022764
>>
>
>
>Thanks for the suggestions. Any thoughts on timeouts for DRBD?
>
>--Eric
>
>Disclaimer : This email and any files transmitted with it are
>confidential and intended solely for intended recipients. If you are
>not the named addressee you should not disseminate, distribute, copy or
>alter this email. Any views or opinions presented in this email are
>solely those of the author and might not represent those of Physician
>Select Management. Warning: Although Physician Select Management has
>taken reasonable precautions to ensure no viruses are present in this
>email, the company cannot accept responsibility for any loss or damage
>arising from the use of this email or attachments.

Hi Eric,

The timeouts can be treated as 'how much time to wait before  taking any 
action'. The workload is not very important (HANA  is something different).

You can try with 10s (token) , 12s (consensus) and if needed  you can adjust.

Warning: Use a 3 node cluster or at least 2 drbd nodes + qdisk. The 2 node 
cluster is vulnerable to split brain, especially when one of the nodes  is 
syncing  (for example after a patching) and the source is 
fenced/lost/disconnected. It's very hard to extract data from a semi-synced  
drbd.

Also, if you need guidance for the SELINUX, I can point you to my guide in the 
centos forum.

Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

2020-02-06 Thread Eric Robinson

Hi Nikolov --

> Defaults are 1s  token,  1.2s  consensus which is too small.
> In Suse, token is 10s, while consensus  is 1.2 * token -> 12s.
> With these settings, cluster  will not react   for 22s.
>
> I think it's a good start for your cluster .
> Don't forget to put  the cluster  in maintenance (pcs property set
> maintenance-mode=true) before restarting the stack ,  or  even better - get
> some downtime.
>
> You can use the following article to run a simulation before removing the
> maintenance:
> https://www.suse.com/support/kb/doc/?id=7022764
>


Thanks for the suggestions. Any thoughts on timeouts for DRBD?

--Eric

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: Why Do Nodes Leave the Cluster?

2020-02-06 Thread Eric Robinson

> >
> > I've done that with all my other clusters, but these two servers are
> > in Azure, so the network is out of our control.
>
> Is a normal cluster supported to use corosync over Internet? I'm not sure
> (because of the delays and possible packet losses).
>
>

As with most things, the main concern is latency and loss. The latency between 
these two nodes is < 1ms, and loss is always 0%.

--Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] PCS cluster auth fails

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

Re: [ClusterLabs] Antw: [EXT] Re: Why Do Nodes Leave the Cluster?

4 matches

Site Navigation

Mail list logo

Footer information