Re: [ClusterLabs] Postgres clone resource does not get "notice" events
OK, Thank you very much for your help! _Vitaly > On 07/05/2022 8:47 PM Reid Wahl wrote: > > > On Tue, Jul 5, 2022 at 3:03 PM vitaly wrote: > > > > Hello, > > Yes, the snippet has everything there was for the full second of Jul 05 > > 11:54:34. I did not cut anything between the last line of 11:54:33 and > > first line of 11:54:35. > > > > Here is grep from pacemaker config: > > > > d19-25-left.lab.archivas.com ~ # egrep -v '^($|#)' /etc/sysconfig/pacemaker > > PCMK_logfile=/var/log/pacemaker.log > > SBD_SYNC_RESOURCE_STARTUP="no" > > PCMK_trace_functions=services_action_sync,svc_read_output > > d19-25-left.lab.archivas.com ~ # > > > > I also grepped CURRENT pacemaker.log for services_action_sync and got just > > 4 recs for the time that does not seem to match failures: > > > > d19-25-left.lab.archivas.com ~ # grep services_action_sync > > /var/log/pacemaker.log > > Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced[47287] > > (services_action_sync@services.c:901) trace: > (null)_(null)_0: > > /usr/sbin/fence_ipmilan = 0 > > Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced[47287] > > (services_action_sync@services.c:903) trace: > stdout: > version="1.0" ?> > > Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced[47287] > > (services_action_sync@services.c:901) trace: > (null)_(null)_0: > > /usr/sbin/fence_sbd = 0 > > Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced[47287] > > (services_action_sync@services.c:903) trace: > stdout: > version="1.0" ?> > > > > This is grep of messages for failures: > > > > d19-25-left.lab.archivas.com ~ # grep " 5 21:[23].*Failed to .*pgsql-rhino" > > /var/log/messages > > Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: warning: Failed to > > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: warning: Failed to > > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: warning: Failed to > > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > Jul 5 21:20:44 d19-25-left pacemaker-controld[47291]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 5 21:20:44 d19-25-left pacemaker-controld[47291]: warning: Failed to > > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > Jul 5 21:20:47 d19-25-left pacemaker-controld[47291]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 5 21:20:47 d19-25-left pacemaker-controld[47291]: warning: Failed to > > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > Jul 5 21:20:48 d19-25-left pacemaker-controld[47291]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 5 21:20:48 d19-25-left pacemaker-controld[47291]: warning: Failed to > > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > Jul 5 21:20:48 d19-25-left pacemaker-controld[47291]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 5 21:20:48 d19-25-left pacemaker-controld[47291]: warning: Failed to > > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > Jul 5 21:20:49 d19-25-left pacemaker-controld[47291]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 5 21:20:49 d19-25-left pacemaker-controld[47291]: warning: Failed to > > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to > > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to > > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to > > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to > > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > d19-25-left.lab.archivas.com ~ # > > > > Sorry, these logs are not the same time as thi
Re: [ClusterLabs] Postgres clone resource does not get "notice" events
On Tue, Jul 5, 2022 at 3:03 PM vitaly wrote: > > Hello, > Yes, the snippet has everything there was for the full second of Jul 05 > 11:54:34. I did not cut anything between the last line of 11:54:33 and first > line of 11:54:35. > > Here is grep from pacemaker config: > > d19-25-left.lab.archivas.com ~ # egrep -v '^($|#)' /etc/sysconfig/pacemaker > PCMK_logfile=/var/log/pacemaker.log > SBD_SYNC_RESOURCE_STARTUP="no" > PCMK_trace_functions=services_action_sync,svc_read_output > d19-25-left.lab.archivas.com ~ # > > I also grepped CURRENT pacemaker.log for services_action_sync and got just 4 > recs for the time that does not seem to match failures: > > d19-25-left.lab.archivas.com ~ # grep services_action_sync > /var/log/pacemaker.log > Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced[47287] > (services_action_sync@services.c:901) trace: > (null)_(null)_0: > /usr/sbin/fence_ipmilan = 0 > Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced[47287] > (services_action_sync@services.c:903) trace: > stdout: ?> > Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced[47287] > (services_action_sync@services.c:901) trace: > (null)_(null)_0: > /usr/sbin/fence_sbd = 0 > Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced[47287] > (services_action_sync@services.c:903) trace: > stdout: ?> > > This is grep of messages for failures: > > d19-25-left.lab.archivas.com ~ # grep " 5 21:[23].*Failed to .*pgsql-rhino" > /var/log/messages > Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: warning: Failed to get > metadata for postgres (ocf:heartbeat:pgsql-rhino) > Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: warning: Failed to get > metadata for postgres (ocf:heartbeat:pgsql-rhino) > Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: warning: Failed to get > metadata for postgres (ocf:heartbeat:pgsql-rhino) > Jul 5 21:20:44 d19-25-left pacemaker-controld[47291]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 5 21:20:44 d19-25-left pacemaker-controld[47291]: warning: Failed to get > metadata for postgres (ocf:heartbeat:pgsql-rhino) > Jul 5 21:20:47 d19-25-left pacemaker-controld[47291]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 5 21:20:47 d19-25-left pacemaker-controld[47291]: warning: Failed to get > metadata for postgres (ocf:heartbeat:pgsql-rhino) > Jul 5 21:20:48 d19-25-left pacemaker-controld[47291]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 5 21:20:48 d19-25-left pacemaker-controld[47291]: warning: Failed to get > metadata for postgres (ocf:heartbeat:pgsql-rhino) > Jul 5 21:20:48 d19-25-left pacemaker-controld[47291]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 5 21:20:48 d19-25-left pacemaker-controld[47291]: warning: Failed to get > metadata for postgres (ocf:heartbeat:pgsql-rhino) > Jul 5 21:20:49 d19-25-left pacemaker-controld[47291]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 5 21:20:49 d19-25-left pacemaker-controld[47291]: warning: Failed to get > metadata for postgres (ocf:heartbeat:pgsql-rhino) > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to get > metadata for postgres (ocf:heartbeat:pgsql-rhino) > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to get > metadata for postgres (ocf:heartbeat:pgsql-rhino) > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to get > metadata for postgres (ocf:heartbeat:pgsql-rhino) > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to get > metadata for postgres (ocf:heartbeat:pgsql-rhino) > d19-25-left.lab.archivas.com ~ # > > Sorry, these logs are not the same time as this morning as I reinstalled > cluster couple of times today. > > Thanks, > _Vitaly > Strange. If we reach "Failed to receive meta-data", that means services_action_sync() returned true... and if services_action_sync() returned true, then we should hit a crm_trace() line no matter what. ``` lrmd_api_
Re: [ClusterLabs] Postgres clone resource does not get "notice" events
Hello, Yes, the snippet has everything there was for the full second of Jul 05 11:54:34. I did not cut anything between the last line of 11:54:33 and first line of 11:54:35. Here is grep from pacemaker config: d19-25-left.lab.archivas.com ~ # egrep -v '^($|#)' /etc/sysconfig/pacemaker PCMK_logfile=/var/log/pacemaker.log SBD_SYNC_RESOURCE_STARTUP="no" PCMK_trace_functions=services_action_sync,svc_read_output d19-25-left.lab.archivas.com ~ # I also grepped CURRENT pacemaker.log for services_action_sync and got just 4 recs for the time that does not seem to match failures: d19-25-left.lab.archivas.com ~ # grep services_action_sync /var/log/pacemaker.log Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced[47287] (services_action_sync@services.c:901) trace: > (null)_(null)_0: /usr/sbin/fence_ipmilan = 0 Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced[47287] (services_action_sync@services.c:903) trace: > stdout: Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced[47287] (services_action_sync@services.c:901) trace: > (null)_(null)_0: /usr/sbin/fence_sbd = 0 Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced[47287] (services_action_sync@services.c:903) trace: > stdout: This is grep of messages for failures: d19-25-left.lab.archivas.com ~ # grep " 5 21:[23].*Failed to .*pgsql-rhino" /var/log/messages Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 5 21:20:43 d19-25-left pacemaker-controld[47291]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) Jul 5 21:20:44 d19-25-left pacemaker-controld[47291]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 5 21:20:44 d19-25-left pacemaker-controld[47291]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) Jul 5 21:20:47 d19-25-left pacemaker-controld[47291]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 5 21:20:47 d19-25-left pacemaker-controld[47291]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) Jul 5 21:20:48 d19-25-left pacemaker-controld[47291]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 5 21:20:48 d19-25-left pacemaker-controld[47291]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) Jul 5 21:20:48 d19-25-left pacemaker-controld[47291]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 5 21:20:48 d19-25-left pacemaker-controld[47291]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) Jul 5 21:20:49 d19-25-left pacemaker-controld[47291]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 5 21:20:49 d19-25-left pacemaker-controld[47291]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) d19-25-left.lab.archivas.com ~ # Sorry, these logs are not the same time as this morning as I reinstalled cluster couple of times today. Thanks, _Vitaly > On 07/05/2022 3:19 PM Reid Wahl wrote: > > > On Tue, Jul 5, 2022 at 5:17 AM vitaly wrote: > > > > Hello, > > Thanks for looking at this issue! > > Snippets from /var/log/messages and /var/log/pacemaker.log are below. > > _Vitaly > > > > Here is /var/log/pacemaker.log snippet around the failure: > > > > Jul 05 11:54:33 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > > (svc_read_output@services_linux.c:277)
Re: [ClusterLabs] Postgres clone resource does not get "notice" events
On Tue, Jul 5, 2022 at 5:17 AM vitaly wrote: > > Hello, > Thanks for looking at this issue! > Snippets from /var/log/messages and /var/log/pacemaker.log are below. > _Vitaly > > Here is /var/log/pacemaker.log snippet around the failure: > > Jul 05 11:54:33 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:277) trace: Reading M_IP_monitor_1 > stdout into offset 177 > Jul 05 11:54:34 tomcat-rhino(tomcat-instance)[2295103]:INFO: [tomcat] > Leave tomcat start 0 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:280) trace: Reading > tomcat-instance_start_0 stderr into offset 0 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:277) trace: Reading > tomcat-instance_start_0 stdout into offset 10505 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (log_finished@execd_commands.c:214) info: tomcat-instance start (call > 59, PID 2295103) exited with status 0 (execution time 110997ms, queue time > 0ms) > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (log_execute@execd_commands.c:232) info: executing - rsc:N1F1 action:start > call_id:66 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (log_execute@execd_commands.c:232) info: executing - rsc:fs_monitor > action:start call_id:67 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:280) trace: Reading fs_monitor_start_0 > stdout into offset 0 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:287) trace: Got 54 chars: 2298369 > (process ID) old priority 0, new priority -10 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:280) trace: Reading N1F1_start_0 > stdout into offset 0 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:287) trace: Got 175 chars: 8: bond0: > mtu 1500 qdisc noqueue state > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:280) trace: Reading > tomcat-instance_monitor_1 stderr into offset 0 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:280) trace: Reading > tomcat-instance_monitor_1 stdout into offset 0 > Jul 05 11:54:34 fs_monitor-rhino(fs_monitor)[2298359]:INFO: Started > fs_monitor.sh, pid=2298369 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:280) trace: Reading fs_monitor_start_0 > stderr into offset 0 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:277) trace: Reading fs_monitor_start_0 > stdout into offset 54 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (log_finished@execd_commands.c:214) info: fs_monitor start (call 67, > PID 2298359) exited with status 0 (execution time 31ms, queue time 0ms) > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (log_execute@execd_commands.c:232) info: executing - rsc:ClusterMonitor > action:start call_id:69 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:280) trace: Reading > fs_monitor_monitor_1 stderr into offset 0 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:280) trace: Reading > fs_monitor_monitor_1 stdout into offset 0 > Jul 05 11:54:34 IPaddr2-rhino(N1F1)[2298357]:INFO: Adding inet address > 172.18.51.93/23 with broadcast address 172.18.51.255 to device bond0 (with > label bond0:N1F1) > Jul 05 11:54:34 IPaddr2-rhino(N1F1)[2298357]:INFO: Bringing device bond0 > up > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log > Jul 05 11:54:34 IPaddr2-rhino(N1F1)[2298357]:INFO: > /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p > /run/resource-agents/send_arp-172.18.51.93 bond0 172.18.51.93 auto not_used > not_used > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:280) trace: Reading N1F1_start_0 > stderr into offset 0 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (svc_read_output@services_linux.c:277) trace: Reading N1F1_start_0 > stdout into offset 175 > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] > (log_finished@execd_commands.c:214) info: N1F1 start (call 66, PID > 2298357) exited with status 0 (execution time 68m
Re: [ClusterLabs] Postgres clone resource does not get "notice" events
Hello, Thanks for looking at this issue! Snippets from /var/log/messages and /var/log/pacemaker.log are below. _Vitaly Here is /var/log/pacemaker.log snippet around the failure: Jul 05 11:54:33 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:277) trace: Reading M_IP_monitor_1 stdout into offset 177 Jul 05 11:54:34 tomcat-rhino(tomcat-instance)[2295103]:INFO: [tomcat] Leave tomcat start 0 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:280) trace: Reading tomcat-instance_start_0 stderr into offset 0 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:277) trace: Reading tomcat-instance_start_0 stdout into offset 10505 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (log_finished@execd_commands.c:214) info: tomcat-instance start (call 59, PID 2295103) exited with status 0 (execution time 110997ms, queue time 0ms) Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (log_execute@execd_commands.c:232) info: executing - rsc:N1F1 action:start call_id:66 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (log_execute@execd_commands.c:232) info: executing - rsc:fs_monitor action:start call_id:67 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:280) trace: Reading fs_monitor_start_0 stdout into offset 0 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:287) trace: Got 54 chars: 2298369 (process ID) old priority 0, new priority -10 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:280) trace: Reading N1F1_start_0 stdout into offset 0 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:287) trace: Got 175 chars: 8: bond0: mtu 1500 qdisc noqueue state Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:280) trace: Reading tomcat-instance_monitor_1 stderr into offset 0 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:280) trace: Reading tomcat-instance_monitor_1 stdout into offset 0 Jul 05 11:54:34 fs_monitor-rhino(fs_monitor)[2298359]:INFO: Started fs_monitor.sh, pid=2298369 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:280) trace: Reading fs_monitor_start_0 stderr into offset 0 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:277) trace: Reading fs_monitor_start_0 stdout into offset 54 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (log_finished@execd_commands.c:214) info: fs_monitor start (call 67, PID 2298359) exited with status 0 (execution time 31ms, queue time 0ms) Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (log_execute@execd_commands.c:232) info: executing - rsc:ClusterMonitor action:start call_id:69 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:280) trace: Reading fs_monitor_monitor_1 stderr into offset 0 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:280) trace: Reading fs_monitor_monitor_1 stdout into offset 0 Jul 05 11:54:34 IPaddr2-rhino(N1F1)[2298357]:INFO: Adding inet address 172.18.51.93/23 with broadcast address 172.18.51.255 to device bond0 (with label bond0:N1F1) Jul 05 11:54:34 IPaddr2-rhino(N1F1)[2298357]:INFO: Bringing device bond0 up Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log Jul 05 11:54:34 IPaddr2-rhino(N1F1)[2298357]:INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /run/resource-agents/send_arp-172.18.51.93 bond0 172.18.51.93 auto not_used not_used Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:280) trace: Reading N1F1_start_0 stderr into offset 0 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (svc_read_output@services_linux.c:277) trace: Reading N1F1_start_0 stdout into offset 175 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (log_finished@execd_commands.c:214) info: N1F1 start (call 66, PID 2298357) exited with status 0 (execution time 68ms, queue time 0ms) Jul 05 11:54:34 cluster_monitor-rhino(ClusterMonitor)[2298481]:INFO: Started cluster_monitor.sh, pid=2298549 Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd [2294543] (log_fi
Re: [ClusterLabs] Postgres clone resource does not get "notice" events
On Mon, Jul 4, 2022 at 7:19 AM vitaly wrote: > > I get printout of metadata as follows: > d19-25-left.lab.archivas.com ~ # OCF_ROOT=/usr/lib/ocf > /usr/lib/ocf/resource.d/heartbeat/pgsql-rhino meta-data > > > > 1.0 > > > Resource script for PostgreSQL. It manages a PostgreSQL as an HA resource. > > Manages a PostgreSQL database instance > > > > > Path to pg_ctl command. > > pgctl > > > > > > Start options (-o start_opt in pg_ctl). "-i -p 5432" for example. > > start_opt > > > > > > Additional pg_ctl options (-w, -W etc..). > > ctl_opt > > > > > > Path to psql command. > > psql > > > > > > Path to PostgreSQL data directory. > > pgdata > > > > > > User that owns PostgreSQL. > > pgdba > > > > > > Hostname/IP address where PostgreSQL is listening > > pghost > > > > > > Port where PostgreSQL is listening > > pgport > > > > > > PostgreSQL user that pgsql RA will user for monitor operations. If it's not > set > pgdba user will be used. > > monitor_user > > > > > > Password for monitor user. > > monitor_password > > > > > > SQL script that will be used for monitor operations. > > monitor_sql > > > > > > Path to the PostgreSQL configuration file for the instance. > > Configuration file > > > > > > Database that will be used for monitoring. > > pgdb > > > > > > Path to PostgreSQL server log output file. > > logfile > > > > > > Unix socket directory for PostgeSQL > > socketdir > > > > > > Number of shutdown retries (using -m fast) before resorting to -m immediate > > stop escalation > > > > > > Replication mode(none(default)/async/sync). > "async" and "sync" require PostgreSQL 9.1 or later. > If you use async or sync, it requires node_list, master_ip, restore_command > parameters, and needs setting postgresql.conf, pg_hba.conf up for > replication. > Please delete "include /../../rep_mode.conf" line in postgresql.conf > when you switch from sync to async. > > rep_mode > > > > > > All node names. Please separate each node name with a space. > This is required for replication. > > node list > > > > > > restore_command for recovery.conf. > This is required for replication. > > restore_command > > > > > > Master's floating IP address to be connected from hot standby. > This parameter is used for "primary_conninfo" in recovery.conf. > This is required for replication. > > master ip > > > > > > User used to connect to the master server. > This parameter is used for "primary_conninfo" in recovery.conf. > This is required for replication. > > repuser > > > > > > Location of WALS archived by the other node > > remote_wals_dir > > > > > > Location of WALS on current node in Rhino before 2.2.0 > > xlogs_dir > > > > > > Location of WALS on current node in Rhino 2.2.0 and later > > wals_dir > > > > > > User used to connect to the master server. > This parameter is used for "primary_conninfo" in recovery.conf. > This is required for replication. > > reppassword > > > > > > primary_conninfo options of recovery.conf except host, port, user and > application_name. > This is optional for replication. > > primary_conninfo_opt > > > > > > Path to temporary directory. > This is optional for replication. > > tmpdir > > > > > > Number of checking xlog on monitor before promote. > This is optional for replication. > > xlog check count > > > > > > The timeout of crm_attribute forever update command. > Default value is 5 seconds. > This is optional for replication. > > The timeout of crm_attribute forever update > command. > > > > > > Number of shutdown retries (using -m fast) before resorting to -m immediate > in Slave state. > This is optional for replication. > > stop escalation_in_slave > > > > > > Number of seconds to wait for a postgreSQL process to be running but not > necessarilly usable > > Seconds to wait for a process to be running > > > > > > Number of failed starts before the system forces a recovery from the master > database > > Start failures before recovery > > > > > > Configuration file with overrides for pgsql-rhino. > > Rhino configuration file > > > > > > > > > > > > > > > > > > > Hmm, seems reasonable. No permissions issues, and it looks like we should only print the "Failed to receive" message if we don't receive any stdout at all from the meta-data action. Can you add the following to /etc/sysconfig/pacemaker and restart pacemaker? Then monitor /var/log/pacemaker/pacemaker.log for relevant trace-level messages around the same time as the "Failed to receive meta-data" messages. PCMK_trace_functions=services_action_sync,svc_read_output This will get fairly verbose if you have more than a couple of resources, so after you've grabbed any relevant logs, comment that line out and restart pacemaker again. > > > On 07/04/2022 5:39 AM Reid Wahl wrote: > > > > > > On Mon, Jul 4, 2022 at 1:06 AM Reid Wahl wrote: > > > > > > On Sat, J
Re: [ClusterLabs] Postgres clone resource does not get "notice" events
I get printout of metadata as follows: d19-25-left.lab.archivas.com ~ # OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/pgsql-rhino meta-data 1.0 Resource script for PostgreSQL. It manages a PostgreSQL as an HA resource. Manages a PostgreSQL database instance Path to pg_ctl command. pgctl Start options (-o start_opt in pg_ctl). "-i -p 5432" for example. start_opt Additional pg_ctl options (-w, -W etc..). ctl_opt Path to psql command. psql Path to PostgreSQL data directory. pgdata User that owns PostgreSQL. pgdba Hostname/IP address where PostgreSQL is listening pghost Port where PostgreSQL is listening pgport PostgreSQL user that pgsql RA will user for monitor operations. If it's not set pgdba user will be used. monitor_user Password for monitor user. monitor_password SQL script that will be used for monitor operations. monitor_sql Path to the PostgreSQL configuration file for the instance. Configuration file Database that will be used for monitoring. pgdb Path to PostgreSQL server log output file. logfile Unix socket directory for PostgeSQL socketdir Number of shutdown retries (using -m fast) before resorting to -m immediate stop escalation Replication mode(none(default)/async/sync). "async" and "sync" require PostgreSQL 9.1 or later. If you use async or sync, it requires node_list, master_ip, restore_command parameters, and needs setting postgresql.conf, pg_hba.conf up for replication. Please delete "include /../../rep_mode.conf" line in postgresql.conf when you switch from sync to async. rep_mode All node names. Please separate each node name with a space. This is required for replication. node list restore_command for recovery.conf. This is required for replication. restore_command Master's floating IP address to be connected from hot standby. This parameter is used for "primary_conninfo" in recovery.conf. This is required for replication. master ip User used to connect to the master server. This parameter is used for "primary_conninfo" in recovery.conf. This is required for replication. repuser Location of WALS archived by the other node remote_wals_dir Location of WALS on current node in Rhino before 2.2.0 xlogs_dir Location of WALS on current node in Rhino 2.2.0 and later wals_dir User used to connect to the master server. This parameter is used for "primary_conninfo" in recovery.conf. This is required for replication. reppassword primary_conninfo options of recovery.conf except host, port, user and application_name. This is optional for replication. primary_conninfo_opt Path to temporary directory. This is optional for replication. tmpdir Number of checking xlog on monitor before promote. This is optional for replication. xlog check count The timeout of crm_attribute forever update command. Default value is 5 seconds. This is optional for replication. The timeout of crm_attribute forever update command. Number of shutdown retries (using -m fast) before resorting to -m immediate in Slave state. This is optional for replication. stop escalation_in_slave Number of seconds to wait for a postgreSQL process to be running but not necessarilly usable Seconds to wait for a process to be running Number of failed starts before the system forces a recovery from the master database Start failures before recovery Configuration file with overrides for pgsql-rhino. Rhino configuration file > On 07/04/2022 5:39 AM Reid Wahl wrote: > > > On Mon, Jul 4, 2022 at 1:06 AM Reid Wahl wrote: > > > > On Sat, Jul 2, 2022 at 1:12 PM vitaly wrote: > > > > > > Sorry, I noticed that I am missing meta "notice=true" and after adding it > > > to postgres-ms configuration "notice" events started to come through. > > > Item 1 still needs explanation. As pacemaker-controld keeps complaining. > > > > What happens when you run `OCF_ROOT=/usr/lib/ocf > > /usr/lib/ocf/resource.d/heartbeat/pgsql-rhino meta-data`? > > This may also be relevant: > https://lists.clusterlabs.org/pipermail/users/2022-June/030391.html > > > > > > Thanks! > > > _Vitaly > > > > > > > On 07/02/2022 2:04 PM vitaly wrote: > > > > > > > > > > > > Hello Everybody. > > > > I have a 2 node cluster with clone resource “postgres-ms”. We are > > > > running following versions of pacemaker/corosync: > > > > d19-25-left.lab.archivas.com ~ # rpm -qa | grep "pacemaker\|corosync" > > > > pacemaker-cluster-libs-2.0.5-9.el8.x86_64 > > > > pacemaker-libs-2.0.5-9.el8.x86_64 > > > > pacemaker-cli-2.0.5-9.el8.x86_64 > > > > corosynclib-3.1.0-5.el8.x86_64 > > > > pacemaker-schemas-2.0.5-9.el8.noarch > > > > corosync-3.1.0-5.el8.x86_64 > > > > pacemaker-2.0.5-9.el8.x86_64 > > > > > > > > There are couple of issues that could be related. > > > > 1. There are following messages in the logs coming from > > > > pacemaker-controld: > > > > Jul 2 1
Re: [ClusterLabs] Postgres clone resource does not get "notice" events
On Mon, Jul 4, 2022 at 1:06 AM Reid Wahl wrote: > > On Sat, Jul 2, 2022 at 1:12 PM vitaly wrote: > > > > Sorry, I noticed that I am missing meta "notice=true" and after adding it > > to postgres-ms configuration "notice" events started to come through. > > Item 1 still needs explanation. As pacemaker-controld keeps complaining. > > What happens when you run `OCF_ROOT=/usr/lib/ocf > /usr/lib/ocf/resource.d/heartbeat/pgsql-rhino meta-data`? This may also be relevant: https://lists.clusterlabs.org/pipermail/users/2022-June/030391.html > > > Thanks! > > _Vitaly > > > > > On 07/02/2022 2:04 PM vitaly wrote: > > > > > > > > > Hello Everybody. > > > I have a 2 node cluster with clone resource “postgres-ms”. We are running > > > following versions of pacemaker/corosync: > > > d19-25-left.lab.archivas.com ~ # rpm -qa | grep "pacemaker\|corosync" > > > pacemaker-cluster-libs-2.0.5-9.el8.x86_64 > > > pacemaker-libs-2.0.5-9.el8.x86_64 > > > pacemaker-cli-2.0.5-9.el8.x86_64 > > > corosynclib-3.1.0-5.el8.x86_64 > > > pacemaker-schemas-2.0.5-9.el8.noarch > > > corosync-3.1.0-5.el8.x86_64 > > > pacemaker-2.0.5-9.el8.x86_64 > > > > > > There are couple of issues that could be related. > > > 1. There are following messages in the logs coming from > > > pacemaker-controld: > > > Jul 2 14:59:27 d19-25-right pacemaker-controld[1489734]: error: Failed > > > to receive meta-data for ocf:heartbeat:pgsql-rhino > > > Jul 2 14:59:27 d19-25-right pacemaker-controld[1489734]: warning: Failed > > > to get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > > > > > 2. ocf:heartbeat:pgsql-rhino does not get any "notice" operations which > > > causes multiple issues with postgres synchronization during availability > > > events. > > > > > > 3. Item 2 raises another question. Who is setting these values: > > > ${OCF_RESKEY_CRM_meta_notify_type} > > > ${OCF_RESKEY_CRM_meta_notify_operation} > > > > > > Here is excerpt from cluster config: > > > > > > d19-25-left.lab.archivas.com ~ # pcs config > > > > > > Cluster Name: > > > Corosync Nodes: > > > d19-25-right.lab.archivas.com d19-25-left.lab.archivas.com > > > Pacemaker Nodes: > > > d19-25-left.lab.archivas.com d19-25-right.lab.archivas.com > > > > > > Resources: > > > Clone: postgres-ms > > > Meta Attrs: promotable=true target-role=started > > > Resource: postgres (class=ocf provider=heartbeat type=pgsql-rhino) > > >Attributes: master_ip=172.16.1.6 > > > node_list="d19-25-left.lab.archivas.com d19-25-right.lab.archivas.com" > > > pgdata=/pg_data remote_wals_dir=/remote/walarchive rep_mode=sync > > > reppassword=XX repuser=XXX > > > restore_command="/opt/rhino/sil/bin/script_wrapper.sh wal_restore.py %f > > > %p" tmpdir=/pg_data/tmp wals_dir=/pg_data/pg_wal > > > xlogs_dir=/pg_data/pg_xlog > > >Meta Attrs: is-managed=true > > >Operations: demote interval=0 on-fail=restart timeout=120s > > > (postgres-demote-interval-0) > > >methods interval=0s timeout=5 > > > (postgres-methods-interval-0s) > > >monitor interval=10s on-fail=restart timeout=300s > > > (postgres-monitor-interval-10s) > > >monitor interval=5s on-fail=restart role=Master > > > timeout=300s (postgres-monitor-interval-5s) > > >notify interval=0 on-fail=restart timeout=90s > > > (postgres-notify-interval-0) > > >promote interval=0 on-fail=restart timeout=120s > > > (postgres-promote-interval-0) > > >start interval=0 on-fail=restart timeout=1800s > > > (postgres-start-interval-0) > > >stop interval=0 on-fail=fence timeout=120s > > > (postgres-stop-interval-0) > > > Thank you very much! > > > _Vitaly > > > ___ > > > Manage your subscription: > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > ClusterLabs home: https://www.clusterlabs.org/ > > ___ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > -- > Regards, > > Reid Wahl (He/Him), RHCA > Senior Software Maintenance Engineer, Red Hat > CEE - Platform Support Delivery - ClusterHA -- Regards, Reid Wahl (He/Him), RHCA Senior Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Postgres clone resource does not get "notice" events
On Sat, Jul 2, 2022 at 1:12 PM vitaly wrote: > > Sorry, I noticed that I am missing meta "notice=true" and after adding it to > postgres-ms configuration "notice" events started to come through. > Item 1 still needs explanation. As pacemaker-controld keeps complaining. What happens when you run `OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/pgsql-rhino meta-data`? > Thanks! > _Vitaly > > > On 07/02/2022 2:04 PM vitaly wrote: > > > > > > Hello Everybody. > > I have a 2 node cluster with clone resource “postgres-ms”. We are running > > following versions of pacemaker/corosync: > > d19-25-left.lab.archivas.com ~ # rpm -qa | grep "pacemaker\|corosync" > > pacemaker-cluster-libs-2.0.5-9.el8.x86_64 > > pacemaker-libs-2.0.5-9.el8.x86_64 > > pacemaker-cli-2.0.5-9.el8.x86_64 > > corosynclib-3.1.0-5.el8.x86_64 > > pacemaker-schemas-2.0.5-9.el8.noarch > > corosync-3.1.0-5.el8.x86_64 > > pacemaker-2.0.5-9.el8.x86_64 > > > > There are couple of issues that could be related. > > 1. There are following messages in the logs coming from pacemaker-controld: > > Jul 2 14:59:27 d19-25-right pacemaker-controld[1489734]: error: Failed to > > receive meta-data for ocf:heartbeat:pgsql-rhino > > Jul 2 14:59:27 d19-25-right pacemaker-controld[1489734]: warning: Failed > > to get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > > > 2. ocf:heartbeat:pgsql-rhino does not get any "notice" operations which > > causes multiple issues with postgres synchronization during availability > > events. > > > > 3. Item 2 raises another question. Who is setting these values: > > ${OCF_RESKEY_CRM_meta_notify_type} > > ${OCF_RESKEY_CRM_meta_notify_operation} > > > > Here is excerpt from cluster config: > > > > d19-25-left.lab.archivas.com ~ # pcs config > > > > Cluster Name: > > Corosync Nodes: > > d19-25-right.lab.archivas.com d19-25-left.lab.archivas.com > > Pacemaker Nodes: > > d19-25-left.lab.archivas.com d19-25-right.lab.archivas.com > > > > Resources: > > Clone: postgres-ms > > Meta Attrs: promotable=true target-role=started > > Resource: postgres (class=ocf provider=heartbeat type=pgsql-rhino) > >Attributes: master_ip=172.16.1.6 node_list="d19-25-left.lab.archivas.com > > d19-25-right.lab.archivas.com" pgdata=/pg_data > > remote_wals_dir=/remote/walarchive rep_mode=sync reppassword=XX > > repuser=XXX restore_command="/opt/rhino/sil/bin/script_wrapper.sh > > wal_restore.py %f %p" tmpdir=/pg_data/tmp wals_dir=/pg_data/pg_wal > > xlogs_dir=/pg_data/pg_xlog > >Meta Attrs: is-managed=true > >Operations: demote interval=0 on-fail=restart timeout=120s > > (postgres-demote-interval-0) > >methods interval=0s timeout=5 (postgres-methods-interval-0s) > >monitor interval=10s on-fail=restart timeout=300s > > (postgres-monitor-interval-10s) > >monitor interval=5s on-fail=restart role=Master timeout=300s > > (postgres-monitor-interval-5s) > >notify interval=0 on-fail=restart timeout=90s > > (postgres-notify-interval-0) > >promote interval=0 on-fail=restart timeout=120s > > (postgres-promote-interval-0) > >start interval=0 on-fail=restart timeout=1800s > > (postgres-start-interval-0) > >stop interval=0 on-fail=fence timeout=120s > > (postgres-stop-interval-0) > > Thank you very much! > > _Vitaly > > ___ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Regards, Reid Wahl (He/Him), RHCA Senior Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Postgres clone resource does not get "notice" events
Sorry, I noticed that I am missing meta "notice=true" and after adding it to postgres-ms configuration "notice" events started to come through. Item 1 still needs explanation. As pacemaker-controld keeps complaining. Thanks! _Vitaly > On 07/02/2022 2:04 PM vitaly wrote: > > > Hello Everybody. > I have a 2 node cluster with clone resource “postgres-ms”. We are running > following versions of pacemaker/corosync: > d19-25-left.lab.archivas.com ~ # rpm -qa | grep "pacemaker\|corosync" > pacemaker-cluster-libs-2.0.5-9.el8.x86_64 > pacemaker-libs-2.0.5-9.el8.x86_64 > pacemaker-cli-2.0.5-9.el8.x86_64 > corosynclib-3.1.0-5.el8.x86_64 > pacemaker-schemas-2.0.5-9.el8.noarch > corosync-3.1.0-5.el8.x86_64 > pacemaker-2.0.5-9.el8.x86_64 > > There are couple of issues that could be related. > 1. There are following messages in the logs coming from pacemaker-controld: > Jul 2 14:59:27 d19-25-right pacemaker-controld[1489734]: error: Failed to > receive meta-data for ocf:heartbeat:pgsql-rhino > Jul 2 14:59:27 d19-25-right pacemaker-controld[1489734]: warning: Failed to > get metadata for postgres (ocf:heartbeat:pgsql-rhino) > > 2. ocf:heartbeat:pgsql-rhino does not get any "notice" operations which > causes multiple issues with postgres synchronization during availability > events. > > 3. Item 2 raises another question. Who is setting these values: > ${OCF_RESKEY_CRM_meta_notify_type} > ${OCF_RESKEY_CRM_meta_notify_operation} > > Here is excerpt from cluster config: > > d19-25-left.lab.archivas.com ~ # pcs config > > Cluster Name: > Corosync Nodes: > d19-25-right.lab.archivas.com d19-25-left.lab.archivas.com > Pacemaker Nodes: > d19-25-left.lab.archivas.com d19-25-right.lab.archivas.com > > Resources: > Clone: postgres-ms > Meta Attrs: promotable=true target-role=started > Resource: postgres (class=ocf provider=heartbeat type=pgsql-rhino) >Attributes: master_ip=172.16.1.6 node_list="d19-25-left.lab.archivas.com > d19-25-right.lab.archivas.com" pgdata=/pg_data > remote_wals_dir=/remote/walarchive rep_mode=sync reppassword=XX > repuser=XXX restore_command="/opt/rhino/sil/bin/script_wrapper.sh > wal_restore.py %f %p" tmpdir=/pg_data/tmp wals_dir=/pg_data/pg_wal > xlogs_dir=/pg_data/pg_xlog >Meta Attrs: is-managed=true >Operations: demote interval=0 on-fail=restart timeout=120s > (postgres-demote-interval-0) >methods interval=0s timeout=5 (postgres-methods-interval-0s) >monitor interval=10s on-fail=restart timeout=300s > (postgres-monitor-interval-10s) >monitor interval=5s on-fail=restart role=Master timeout=300s > (postgres-monitor-interval-5s) >notify interval=0 on-fail=restart timeout=90s > (postgres-notify-interval-0) >promote interval=0 on-fail=restart timeout=120s > (postgres-promote-interval-0) >start interval=0 on-fail=restart timeout=1800s > (postgres-start-interval-0) >stop interval=0 on-fail=fence timeout=120s > (postgres-stop-interval-0) > Thank you very much! > _Vitaly > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Postgres clone resource does not get "notice" events
Hello Everybody. I have a 2 node cluster with clone resource “postgres-ms”. We are running following versions of pacemaker/corosync: d19-25-left.lab.archivas.com ~ # rpm -qa | grep "pacemaker\|corosync" pacemaker-cluster-libs-2.0.5-9.el8.x86_64 pacemaker-libs-2.0.5-9.el8.x86_64 pacemaker-cli-2.0.5-9.el8.x86_64 corosynclib-3.1.0-5.el8.x86_64 pacemaker-schemas-2.0.5-9.el8.noarch corosync-3.1.0-5.el8.x86_64 pacemaker-2.0.5-9.el8.x86_64 There are couple of issues that could be related. 1. There are following messages in the logs coming from pacemaker-controld: Jul 2 14:59:27 d19-25-right pacemaker-controld[1489734]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino Jul 2 14:59:27 d19-25-right pacemaker-controld[1489734]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino) 2. ocf:heartbeat:pgsql-rhino does not get any "notice" operations which causes multiple issues with postgres synchronization during availability events. 3. Item 2 raises another question. Who is setting these values: ${OCF_RESKEY_CRM_meta_notify_type} ${OCF_RESKEY_CRM_meta_notify_operation} Here is excerpt from cluster config: d19-25-left.lab.archivas.com ~ # pcs config Cluster Name: Corosync Nodes: d19-25-right.lab.archivas.com d19-25-left.lab.archivas.com Pacemaker Nodes: d19-25-left.lab.archivas.com d19-25-right.lab.archivas.com Resources: Clone: postgres-ms Meta Attrs: promotable=true target-role=started Resource: postgres (class=ocf provider=heartbeat type=pgsql-rhino) Attributes: master_ip=172.16.1.6 node_list="d19-25-left.lab.archivas.com d19-25-right.lab.archivas.com" pgdata=/pg_data remote_wals_dir=/remote/walarchive rep_mode=sync reppassword=XX repuser=XXX restore_command="/opt/rhino/sil/bin/script_wrapper.sh wal_restore.py %f %p" tmpdir=/pg_data/tmp wals_dir=/pg_data/pg_wal xlogs_dir=/pg_data/pg_xlog Meta Attrs: is-managed=true Operations: demote interval=0 on-fail=restart timeout=120s (postgres-demote-interval-0) methods interval=0s timeout=5 (postgres-methods-interval-0s) monitor interval=10s on-fail=restart timeout=300s (postgres-monitor-interval-10s) monitor interval=5s on-fail=restart role=Master timeout=300s (postgres-monitor-interval-5s) notify interval=0 on-fail=restart timeout=90s (postgres-notify-interval-0) promote interval=0 on-fail=restart timeout=120s (postgres-promote-interval-0) start interval=0 on-fail=restart timeout=1800s (postgres-start-interval-0) stop interval=0 on-fail=fence timeout=120s (postgres-stop-interval-0) Thank you very much! _Vitaly ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/