[ClusterLabs] FYI to anyone backporting the recent security fixes

2019-05-24 Thread Ken Gaillot
In case anyone is planning to backport only the recent security fixes
to an older pacemaker version, here is a list of all commits that are
relevant.

2.0 branch:

32ded3e0172e0fae89cf70965e1c0406c1db883b High: libservices: fix use-after-free 
wrt. alert handling
912f5d9ce983339e939e4cc55f27791f8c9baa18 High: pacemakerd vs. IPC/procfs 
confused deputy authenticity issue (0/4)
1148f45da977113dff588cdd1cfebb7a47760b32 High: pacemakerd vs. IPC/procfs 
confused deputy authenticity issue (1/4)
970736b1c7ad5c78cc5295a4231e546104d55893 High: pacemakerd vs. IPC/procfs 
confused deputy authenticity issue (2/4)
052e6045eea77685aabeed12c519c7c9eb9b5287 High: pacemakerd vs. IPC/procfs 
confused deputy authenticity issue (3/4)
d324e407c0e2695f405974d567d79eb91d0ee69a High: pacemakerd vs. IPC/procfs 
confused deputy authenticity issue (4/4)
3ad7b2509d78f95b5dfc8fffc4d9a91be1da5113 Med: controld: fix possible NULL 
pointer dereference
bccf845261c6e69fc4e6bdb8cf4e630a4a4ec7a8 Log: libcrmcluster: improve CPG 
membership messages
7dda20dac25f07eae959ca25cc974ef2fa6daf02 Fix: libcrmcommon: avoid use-of-NULL 
when checking whether process is active
d9b0269d59a00329feb19b6e65b10a233a3dd414 Low: libcrmcommon: return proper code 
if testing pid is denied


1.1 branch:

f91a961112ec9796181b42aa52f9c36dfa3c6a99 High: libservices: fix use-after-free 
wrt. alert handling
ab44422fa955c2dff1ac1822521e7ad335d4aab7 High: pacemakerd vs. IPC/procfs 
confused deputy authenticity issue (0/4)
6888aaf3ad365ef772f8189c9958f58b85ec62d4 High: pacemakerd vs. IPC/procfs 
confused deputy authenticity issue (1/4)
904c53ea311fd6fae945a55202b0a7ccf3783465 High: pacemakerd vs. IPC/procfs 
confused deputy authenticity issue (2/4)
07a82c5c8f9d60989ea88c5a3cc316ee290ea784 High: pacemakerd vs. IPC/procfs 
confused deputy authenticity issue (3/4)
4d6f6e01b309cda7b3f8fe791247566d247d8028 High: pacemakerd vs. IPC/procfs 
confused deputy authenticity issue (4/4)
9dc38d81cb6e1967c368faed78de1927cabf06b3 Med: controld: fix possible NULL 
pointer dereference
83811e2115f5516a7faec2e653b1be3d58b35a79 Log: libcrmcluster: improve CPG 
membership messages
d0c12d98e01bc6228fc254456927d79a46554448 Fix: libcrmcommon: avoid use-of-NULL 
when checking whether process is active
c0e1cf579f57922cbe872d23edf144dd2206156b Low: libcrmcommon: return proper code 
if testing pid is denied
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] drbd could not start by pacemaker. strange limited root privileges?

2019-05-24 Thread George Melikov
Looks like selinux restrictions. 23.05.2019, 14:22, "László Neduki" :Hi, (I sent a similar question from an other acount 3 days ago, but: - I do not see it on the list. Maybe I should not see my own email? So I created a new account- I have additional infos (but no solution), so I rewrite the question) pacemaker cannot start drbd9 resources. As I see, root has very limited privileges in the drbd resource agent, when it run by the pacemaker. I downloaded the latest pacemaker this week, and I compiled drbd9 rpms also. I hope, You can help me, I do not find the cause of this behaviour. Please see the below test cases: 1. When I create Pacemaker DRBD resource I get errors# pcs resource create DrbdDB ocf:linbit:drbd drbd_resource=drbd_db op monitor interval=60s meta notify=true# pcs resource master DrbdDBClone DrbdDB master-max=1 master-node-max=1 clone-node-max=1 notify=true# pcs constraint location DrbdDBClone prefers node1=INFINITY# pcs cluster stop --all; pcs cluster start --all; pcs statusFailed Actions:* DrbdDB_monitor_0 on node1 'not installed' (5): call=6, status=complete, exitreason='DRBD kernel (module) not available?',    last-rc-change='Thu May 23 09:54:09 2019', queued=0ms, exec=58ms* DrbdDB_monitor_0 on node2 'not installed' (5): call=6, status=complete, exitreason='DRBD kernel (module) not available?',    last-rc-change='Thu May 23 10:00:22 2019', queued=0ms, exec=71ms 2. when I try to start drbd_db by drbdadm directly, it works well:# modprobe drbd #on each node# drbdadm up drbd_db #on each node# drbdadm primary drbd_db# drbdadm status it shows drbd_db is UpToDate on each nodeI also can promote and mount filesystem well 3. When I use debug-start, it works fine (so the resource syntax sould be correct)# drbdadm statusNo currently configured DRBD found.# pcs resource debug-start DrbdDBMasterError: unable to debug-start a master, try the master's resource: DrbdDB# pcs resource debug-start DrbdDB #on each nodeOperation start for DrbdDB:0 (ocf:linbit:drbd) returned: 'ok' (0)# drbdadm statusit shows drbd_db is UpToDate on each node 4. Pacemaker handle other resources well . If I set auto_promote=yes, and I start (but not promote) the drbd_db by drbdadm, then pacemaker can create filesystem on it well, and also the appserver, database resources.  5. The strangest behaviour for me. Root have very limited privileges whitin the drbd resource agent. If I write this line to the srbd_start() method of  /usr/lib/ocf/resource.d/linbit/drbd ocf_log err "lados " $(whoami) $( ls -l /home/opc/tmp/modprobe2.trace ) $( do_cmd touch /home/opc/tmp/modprobe2.trace ) I got theese messeges in log, when I start the cluster # tail -f /var/log/cluster/corosync.log | grep -A 8 -B 3 -i lados ...May 21 15:35:12  drbd(DrbdDB)[31649]:    ERROR: lados  rootMay 21 15:35:12 [31309] node1   lrmd:   notice: operation_finished:    DrbdDB_start_0:31649:stderr [ ls: cannot access /home/opc/tmp/modprobe2.trace: Permission denied ]May 21 15:35:12 [31309] node1   lrmd:   notice: operation_finished:    DrbdFra_start_0:31649:stderr [ touch: cannot touch '/home/opc/tmp/modprobe2.trace': Permission denied ]...and also, when I try to strace the "modprobe -s drbd `$DRBDADM sh-mod-parms`" in drbd resource agent, I only see 1 line in the /root/modprobe2.trace. This meens for me:- root cannot trace the calls in drbdadm (even if root can strace drbdadm outside of pacemaker well)- root can write into files his own directory (/root/modprobe2.trace)  6. Opposit of previous testroot has these privileges outside from pacamaker # sudo su -# touch /home/opc/tmp/modprobe2.trace# ls -l /home/opc/tmp/modprobe2.trace-rw-r--r--. 1 root root 0 May 21 15:44 /home/opc/tmp/modprobe2.trace  Thanks: lados.  ,___Manage your subscription:https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs home: https://www.clusterlabs.org/  Sincerely,George Melikov ___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Re: Fencing errors

2019-05-24 Thread Lopez, Francisco Javier [Global IT]
Hello guys.

Please forget about this issue; I set up a process that asks for the status 
every 10 secs and I realized
the process takes around 25 secs when it fails. If this helps any other, this 
is what I did in a loop:

# time fence_vmware_soap --ip  --username "x" -p "x" --ssl 
--ssl-insecure --action status --plug 
ao-pg02-p.axadmin.net,ao-pg01-p.axadmin.net
Status: ON

real0m21.999s  <<<---
user0m15.190s
sys  0m0.294s

The normal execution takes around 14 secs, hence it does not fail.
Since I updated the pcmk_monitor_timeout to 30 the process is running as 
expected.

Now it's my turn to review why of that difference at vmware.

Thx.
Javier

Francisco Javier​   Lopez
IT System Engineer   |  Global IT
O: +34 619 728 249|  M: +34 619 728 
249|
franciscojavier.lo...@solera.com   
 |  Solera.com
Audatex Datos, S.A.  |  Avda. de Bruselas, 36, Salida 16, A‑1 
(Diversia),   Alcobendas  ,   Madrid  ,   28108   , 
  Spain
[cid:image630630.png@5A05821E.C9C08C85]

On 5/23/2019 8:29 PM, Lopez, Francisco Javier [Global IT] wrote:
Hello again Ken et all.

I realized about many things investigating this issue but I feel I need a bit 
more help from you guys.

It's clear the monitoring process is reporting a timeout. Although I've 
increased this timeout to 30c using pcmk_monitoring_timeout,
and during this last 2 hours the process did not fail, I'd like to understand 
more in detail how this process works and if I'm
getting a timeout after 20 secs, it looks to me something else could be 
happening in my systems.

I tried enabling debug again and, as before, the 'debug' option creates the 
file but does not update anything unless I enable 'verbose'.
Funny thing because when I enable it, I hit a bug and the fencing does not 
start:

https://bugzilla.redhat.com/show_bug.cgi?id=1549366

I enabled debug at corosync layer and I got some more information that was nice 
to better understand this issue but still, not enough
information to narrow down where the issue comes from.

Said this, I'd like to know, if there is a way to review more in detail what 
the monitoring process is doing like ping, status, etc
and it that time is dedicated to the same action all those secs.

Any idea will be more than welcome.

As always, appreciate your help.

Regards
Javier



Francisco Javier​   Lopez
IT System Engineer   |  Global IT
O: +34 619 728 249|  M: +34 619 728 
249|
franciscojavier.lo...@solera.com   
 |  
Solera.com
Audatex Datos, S.A.  |  Avda. de Bruselas, 36, Salida 16, A‑1 
(Diversia),   Alcobendas  ,   Madrid  ,   28108   , 
  Spain
[cid:part6.A6B7221B.10233C2B@solera.com]

On 5/21/2019 6:19 PM, Ken Gaillot wrote:

On Tue, 2019-05-21 at 11:10 +, Lopez, Francisco Javier [Global IT]
wrote:


Hello guys !

Need your help to try to understand and debug what I'm facing in one
of my clusters.

I set up fencing with this detail:

# pcs -f stonith_cfg stonith create fence_ao_pg01 fence_vmware_soap
ipaddr= ssl_insecure=1 login="" passwd=""
pcmk_reboot_action=reboot pcmk_host_list="ao-pg01-p.axadmin.net"
power_wait=3 op monitor interval=60s
# pcs -f stonith_cfg stonith create fence_ao_pg02 fence_vmware_soap
ipaddr= ssl_insecure=1 login="" passwd=""
pcmk_reboot_action=reboot pcmk_host_list="ao-pg02-p.axadmin.net"
power_wait=3 op monitor interval=60s

# pcs -f stonith_cfg constraint location fence_ao_pg01 avoids ao-
pg01-p.axadmin.net=INFINITY
# pcs -f stonith_cfg constraint location fence_ao_pg02 avoids ao-
pg02-p.axadmin.net=INFINITY

# pcs cluster cib-push stonith_cfg

The pcs status shows all ok during some time and then it turns to:

[root@ao-pg01-p ~]# pcs status --full
Cluster name: ao_cl_p_01
Stack: corosync
Current DC: ao-pg01-p.axadmin.net (1) (version 1.1.19-8.el7_6.4-
c3c624ea3d) - partition with quorum
Last updated: Tue May 21 12:18:46 2019
Last change: Fri May 17 18:54:32 2019 by hacluster via crmd on ao-
pg01-p.axadmin.net

2 nodes configured
3 resources configured

Online: [ ao-pg01-p.axadmin.net (1) ao-pg02-p.axadmin.net (2) ]

Full list of resources:

 ao-cl-p-01-vip01(ocf::heartbeat:IPaddr2):Started ao-pg01-
p.axadmin.net
 fence_ao_pg01(stonith:fence_vmware_soap):Stopped
 fence_ao_pg02(stonith:fence_vmware_soap):Stopped

Node Attributes:
* Node