Re: [ClusterLabs] [Announce] libqb 1.0.5 release
We are pleased to announce the release of libqb 1.0.5 Source code is available at: https://github.com/ClusterLabs/libqb/releases/download/v1.0.5/libqb-1.0.5.tar.xz Please used the signed .tar.gz or .tar.xz files with the version number in rather than the github-generated "Source Code" ones. This release is an update to fix a regression in 1.0.4, huge thanks to wferi for all the help with this Chrissie ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure
Hi, Here are the logs when pacemaker fails to start postgres service on master. It manage to start only postgres slave. I tried different configuration with pgslqms and pgsql resource agents. Those errors are when I use pgsqlms agent, which configuration I have sent in first mail: Apr 25 16:40:23 [4213] master lrmd: info: log_execute: executing - rsc:PGSQL action:start call_id:51 launching as "postgres" command "/usr/lib/postgresql/9.5/bin/pg_ctl --pgdata /var/lib/postgresql/9.5/main -w --timeout 120 start -o -c config_file=/etc/postgresql/9.5/main/postgresql.conf" Apr 25 16:40:24 [4211] mastercib: info: cib_perform_op: + /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='PGSQL']/lrm_rsc_op[@id='PGSQL_last_0']: @operation_key=PGSQL_start_0, @operation=start, @transition-key=12:30:0:078c2b66-b095-49c4-947b-2427dd7852bf, @transition-magic=0:0;12:30:0:078c2b66-b095-49c4-947b-2427dd7852bf, @call-id=176, @rc-code=0, @exec-time=1146, @queue-time=0 Apr 25 16:40:53 [4216] master crmd:debug: crm_timer_start: Started Shutdown Escalation (I_STOP:120ms), src=53 Apr 25 16:41:23 [4213] master lrmd: warning: child_timeout_callback: PGSQL_start_0 process (PID 5986) timed out Part of the log is attached. On Tue, 23 Apr 2019 at 17:28, Danka Ivanović wrote: > Hi, > It seems that ldap timeout caused cluster failure. Cluster is checking > status every 15s on master and 16s on slave. Cluster needs postgres user > for authentication, but ldap first query user on ldap server and then > localy on host. When connection to ldap server was interrupted, cluster > couldn't find postgres user and authenticate on db to check state. Problem > is solved with reconfiguring /etc/ldap.conf and /etc/nslcd.conf. Following > variable is added: nss_initgroups_ignoreusers with specified local users > which should be ignored when querying ldap server. Thanks for your help. :) > Another problem is that I cannot start postgres master with pacemaker. > When I start postgres manually (with systemd) and then start pacemaker on > slave, pacemaker is able to recognize master and start slave and failover > works. > That is another problem which I didn't manage to solve. Should I send a > new mail for that issue or we can continue in this thread? > > On Fri, 19 Apr 2019 at 19:19, Jehan-Guillaume de Rorthais > wrote: > >> On Fri, 19 Apr 2019 17:26:14 +0200 >> Danka Ivanović wrote: >> ... >> > Should I change any of those timeout parameters in order to avoid >> timeout? >> >> You can try to raise the timeout, indeed. But as far as we don't know >> **why** >> your VMs froze for some time, it is difficult to guess how high should be >> these timeouts. >> >> Not to mention that it will raise your RTO. >> > > > -- > Pozdrav > Danka Ivanovic > -- Pozdrav Danka Ivanovic Apr 25 16:39:50 [4211] mastercib:debug: crm_client_new: Connecting 0x55d8444e8e80 for uid=0 gid=0 pid=5791 id=c93d535d-77d8-4556-9a63-d9a1c2b45de9 Apr 25 16:39:50 [4211] mastercib:debug: handle_new_connection: IPC credentials authenticated (4211-5791-13) Apr 25 16:39:50 [4211] mastercib:debug: qb_ipcs_shm_connect: connecting to client [5791] Apr 25 16:39:50 [4211] mastercib:debug: qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096 Apr 25 16:39:50 [4211] mastercib:debug: qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096 Apr 25 16:39:50 [4211] mastercib:debug: qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096 Apr 25 16:39:50 [4211] mastercib:debug: cib_acl_enabled:CIB ACL is disabled Apr 25 16:39:50 [4211] mastercib:debug: qb_ipcs_dispatch_connection_request:HUP conn (4211-5791-13) Apr 25 16:39:50 [4211] mastercib:debug: qb_ipcs_disconnect: qb_ipcs_disconnect(4211-5791-13) state:2 Apr 25 16:39:50 [4211] mastercib:debug: crm_client_destroy: Destroying 0 events Apr 25 16:39:50 [4211] mastercib:debug: qb_rb_close: Free'ing ringbuffer: /dev/shm/qb-cib_rw-response-4211-5791-13-header Apr 25 16:39:50 [4211] mastercib:debug: qb_rb_close: Free'ing ringbuffer: /dev/shm/qb-cib_rw-event-4211-5791-13-header Apr 25 16:39:50 [4211] mastercib:debug: qb_rb_close: Free'ing ringbuffer: /dev/shm/qb-cib_rw-request-4211-5791-13-header Apr 25 16:39:50 [15544] master corosync debug [QB] IPC credentials authenticated (15544-5837-24) Apr 25 16:39:50 [15544] master corosync debug [QB] connecting to client [5837] Apr 25 16:39:50 [15544] master corosync debug [QB] shm size:1048589; real_size:1052672; rb->word_size:263168 Apr 25 16:39:50 [15544] master corosync debug [QB] shm size:1048589; real_size:1052672; rb->word_size:263168 Apr 25 16:39:50 [15544] master corosync debug [QB] shm size:1048589; real_size:1052672; rb->word_si
Re: [ClusterLabs] Pacemaker detail log directory permissions
On 24/04/19 09:32 -0500, Ken Gaillot wrote: > On Wed, 2019-04-24 at 16:08 +0200, wf...@niif.hu wrote: >> Make install creates /var/log/pacemaker with mode 0770, owned by >> hacluster:haclient. However, if I create the directory as root:root >> instead, pacemaker.log appears as hacluster:haclient all the >> same. What breaks in this setup besides log rotation (which can be >> fixed by removing the su directive)? Why is it a good idea to let >> the haclient group write the logs? > > Cluster administrators are added to the haclient group. It's a minor > use case, but the group write permission allows such users to run > commands that log to the detail log. An example would be running > "crm_resource --force-start" for a resource agent that writes debug > information to the log. I think the prime and foremost use case is that half of the actual pacemaker daemons run as hacluster:haclient themselves, and it's preferred for them to be not completely muted about what they do, correct? :-) Indeed, users can configure whatever log routing they desire (I was actually toying with an idea to make it a lot more flexible, log-per-type-of-daemon and perhaps even distinguished by PID, configurable log formats since currently it's arguably a heavy overkill to keep the hostname stated repeatedly over and over without actually bothering to recheck it from time to time, etc.). Also note, relying on almighty root privileges (like with the pristine deployment) is a silent misconception that cannot be taken for fully granted, so again arguably, even the root daemons should take a haclient group's coat on top of their own just in case [*]. > If ACLs are not in use, such users already have full read/write > access to the CIB, so being able to read and write the log is not an > additional concern. > > With ACLs, I could see wanting to change the permissions, and that idea > has come up already. One approach might be to add a PCMK_log_mode > option that would default to 0660, and users could make it more strict > if desired. It looks reasonable to prevent read-backs by anyone but root, that could be applied without any further toggles, assuming the pacemaker code won't flip once purposefully allowed read bits for group back automatically and unconditionally. [*] for instance when SELinux hits hard (which is currently not the case for Fedora/EL family), even though the executor(s) would need to be exempted if process inheritance taints the tree once forever: https://danwalsh.livejournal.com/69478.html -- Jan (Poki) pgp3dAgFvEUfh.pgp Description: PGP signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Warning (SLES 12 SP4): ocf:heartbeat:CTDB does not work any more
On 25/04/2019 14:35, Ulrich Windl wrote: Hi! I managed to get my cluster up again after upgrading from SLES11 SP4 to SLES12 SP4, but my CTDB Samba won't start any more. The problem is: CTDB(prm_s02_ctdb)[30904]: ERROR: Failed to execute /usr/sbin/ctdbd. lrmd[27341]: notice: prm_s02_ctdb_start_0:30857:stderr [ Invalid option --logfile=/var/log/ctdb/log.ctdb: unknown option ] That option comes from /usr/lib/ocf/resource.d/heartbeat/CTDB: : ${OCF_RESKEY_ctdb_logfile:=/var/log/ctdb/log.ctdb} log_option="--logging=file:$OCF_RESKEY_ctdb_logfile" So the RA and the binary don't match! The binary seems to lack a --logging option. ctdb-4.6.16+git.133.479a9537a28-3.35.4.x86_64 resource-agents-4.1.9+git24.9b664917-3.3.3.x86_64 Regards, Ulrich ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ Ulrich, There have been some IMHO silly changes in CTDB lately. My least favourite is changing the output of ctdb scriptstatus to fixed-width columns instead of machine-readable colon-separated data, but left a completely non-functional option to use a user-provided delimiter, thus completely breaking the nagios monitoring plugin. I complained, and simply said it was "as designed". I felt like replying, "so, stupidly designed then". Cheers Alex ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Warning (SLES 12 SP4): ocf:heartbeat:CTDB does not work any more
Hi! I managed to get my cluster up again after upgrading from SLES11 SP4 to SLES12 SP4, but my CTDB Samba won't start any more. The problem is: CTDB(prm_s02_ctdb)[30904]: ERROR: Failed to execute /usr/sbin/ctdbd. lrmd[27341]: notice: prm_s02_ctdb_start_0:30857:stderr [ Invalid option --logfile=/var/log/ctdb/log.ctdb: unknown option ] That option comes from /usr/lib/ocf/resource.d/heartbeat/CTDB: : ${OCF_RESKEY_ctdb_logfile:=/var/log/ctdb/log.ctdb} log_option="--logging=file:$OCF_RESKEY_ctdb_logfile" So the RA and the binary don't match! The binary seems to lack a --logging option. ctdb-4.6.16+git.133.479a9537a28-3.35.4.x86_64 resource-agents-4.1.9+git24.9b664917-3.3.3.x86_64 Regards, Ulrich ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/