Your message dated Fri, 19 May 2017 19:42:39 +0200
with message-id <20170519174239.vje6uaei57hy3...@msg.df7cb.de>
and subject line Re: Bug#756606: [Pkg-postgresql-public] Bug#756606:
postgresql-9.1: Init-Script does not work together with heartbeat
has caused the Debian Bug report #756606,
regarding postgresql-9.1: Init-Script does not work together with heartbeat
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)
--
756606: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=756606
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems
--- Begin Message ---
Package: postgresql-9.1
Version: 9.1.13-0wheezy1
Severity: important
Dear Maintainer,
After drbd, heartbeat and postgresql-9.1 is installed and basically configured,
the attempt to run postgresql init script from heartbeat's haresources fails in
multiple ways.
Unfortunately, I cannot even tell why it doesn't work, since I cannot even see
reasons why this doesn't work in ha-debug log with debuging enabled.
The following lists what's going wrong:
1) A usual case for Postgres HA Clusters is to have /var/lib/postgresql in a
DRBD synced resource, which is only mounted on one node at a time. When you
have a resource group configured to start drbddisk, mount /var/lib/postgresql,
start postgresql (in that order - see haresources - file listed later in this
report) and start up heartbeat on both nodes, these resources are only started
on the primary node for this resource group (first field in haresources file).
These resources are not acquired on the standby - node.
Unfortunately, when stoping heartbeat on the standby node, heartbeat
nevertheless tries to give up resources, even it hasn't acquired them before.
Since /var/lib/postgresql wasn't mounted before on that node, issuing
"/etc/init.d/postgresql stop" on the standby node fails, since it cannot find
necessary files in /var/lib/postgresql .
Even without having heartbeat STONITH configured, this leads to a hard server
reset somehow.
Solution: "/etc/init.d/postgresql stop" shouldn't return an error when the
datadir is empty to make it usable along with heartbeat.
2) When starting heartbeat, it seems like postgresql isn't started at all. I do
not understand this, since all other init-scripts I have tested (samba, cron)
are working fine when used instead of postgresql in quoted haresources file
below.
I have tried this on multiple, clean Debian wheezy installs from Bare metal
server to workstation VirtualBox setups. The result is always the same.
You find the logs and configurations used following this line:
/etc/ha.d/haresources :
prod-cl3 drbddisk::var_lib_postgres
Filesystem::/dev/drbd0::/var/lib/postgresql::ext4 IPaddr::192.168.20.18/24/eth0
postgresql
=======================
/etc/ha.d/ha.cf :
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 90
udpport 694
ucast eth1 10.250.250.16
auto_failback on
node prod-cl3
node prod-cl4
=======================
/etc/drbd.conf :
include "drbd.d/global_common.conf";
include "drbd.d/*.res";
=======================
/etc/drbd.d/global_common.conf :
global {
usage-count yes;
}
common {
protocol C;
startup {
wfc-timeout 15;
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
}
net {
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 96256;
}
}
=======================
/etc/drbd.d/prod-cl.res :
resource var_lib_postgres {
protocol C;
on prod-cl3 {
device /dev/drbd0;
disk /dev/prod-cl3_data/var_lib_postgres;
address 10.250.250.16:7789;
meta-disk internal;
}
on prod-cl4 {
device /dev/drbd0;
disk /dev/prod-cl4_data/var_lib_postgres;
address 10.250.250.17:7789;
meta-disk internal;
}
}
=======================
ha-debug log, showing postgres isn't even started on primary node when
heartbeat starts:
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Core dumps could be lost if
multiple dumps occur.
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Consider setting non-default
value in /proc/sys/kernel/core_pattern (or equivalent) for maximum
supportability
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: info: Pacemaker support: false
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Logging daemon is disabled
--enabling logging daemon is recommended
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: info: **************************
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: info: Configuration validated.
Starting heartbeat 3.0.5
Jul 30 13:51:11 prod-cl3 heartbeat: [20847]: info: heartbeat: version 3.0.5
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: Heartbeat generation:
1406638883
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth1
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: bound send
socket to device: eth1
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: bound receive
socket to device: eth1
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: started on port
694 interface eth1 to 10.250.250.17
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: Local status now set to: 'up'
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: WARN: node prod-cl4: is dead
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Comm_now_up(): updating
status to active
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Local status now set to:
'active'
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: WARN: No STONITH device configured.
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: WARN: Shared disks are not
protected.
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Resources being acquired
from prod-cl4.
Jul 30 13:52:43 prod-cl3 heartbeat: [20876]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
harc[20876]: 2014/07/30_13:52:43 info: Running /etc/ha.d//rc.d/status status
Jul 30 13:52:43 prod-cl3 heartbeat: [20877]: info: Local Resource acquisition
completed.
mach_down[20910]: 2014/07/30_13:52:43 info:
/usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: debug: StartNextRemoteRscReq():
child count 2
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: mach_down takeover complete.
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Initial resource acquisition
complete (mach_down)
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: debug: StartNextRemoteRscReq():
child count 1
mach_down[20910]: 2014/07/30_13:52:43 info: mach_down takeover complete
for node prod-cl4.
Jul 30 13:52:43 prod-cl3 heartbeat: [20968]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
harc[20968]: 2014/07/30_13:52:43 info: Running
/etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp[20968]: 2014/07/30_13:52:43 received ip-request-resp
drbddisk::var_lib_postgres OK yes
ResourceManager[20989]: 2014/07/30_13:52:43 info: Acquiring resource group:
prod-cl3 drbddisk::var_lib_postgres
Filesystem::/dev/drbd0::/var/lib/postgresql::ext4 192.168.20.18/24/eth0
postgresql
ResourceManager[20989]: 2014/07/30_13:52:43 info: Running
/etc/ha.d/resource.d/drbddisk var_lib_postgres start
Filesystem[21057]: 2014/07/30_13:52:43 INFO: Resource is stopped
ResourceManager[20989]: 2014/07/30_13:52:43 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/postgresql ext4 start
Filesystem[21131]: 2014/07/30_13:52:43 INFO: Running start for /dev/drbd0
on /var/lib/postgresql
FATAL: Module scsi_hostadapter not found.
Filesystem[21125]: 2014/07/30_13:52:43 INFO: Success
INFO: Success
IPaddr[21200]: 2014/07/30_13:52:43 INFO: Resource is stopped
ResourceManager[20989]: 2014/07/30_13:52:43 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.20.18/24/eth0 start
IPaddr[21282]: 2014/07/30_13:52:43 INFO: Using calculated netmask for
192.168.20.18: 255.255.255.0
IPaddr[21282]: 2014/07/30_13:52:43 INFO: eval ifconfig eth0:0 192.168.20.18
netmask 255.255.255.0 broadcast 192.168.20.255
IPaddr[21258]: 2014/07/30_13:52:43 INFO: Success
INFO: Success
Jul 30 13:52:53 prod-cl3 heartbeat: [20847]: info: Local Resource acquisition
completed. (none)
Jul 30 13:52:53 prod-cl3 heartbeat: [20847]: info: local resource transition
completed.
=======================
ha-debug log, showing server crash when postgresql isn't properly stoped (due
to missing files in datadir as described):
Jul 30 13:57:49 prod-cl4 heartbeat: [3340]: info: Heartbeat shutdown in
progress. (3340)
Jul 30 13:57:49 prod-cl4 heartbeat: [3410]: info: Giving up all HA resources.
ResourceManager[3424]: 2014/07/30_13:57:49 info: Releasing resource group:
prod-cl3 drbddisk::var_lib_postgres
Filesystem::/dev/drbd0::/var/lib/postgresql::ext4 192.168.20.18/24/eth0
postgresql
ResourceManager[3424]: 2014/07/30_13:57:49 info: Running
/etc/init.d/postgresql stop
Stopping PostgreSQL 9.1 database server: mainError:
/var/lib/postgresql/9.1/main is not accessible or does not exist ... failed!
failed!
ResourceManager[3424]: 2014/07/30_13:57:50 ERROR: Return code 1 from
/etc/init.d/postgresql
ResourceManager[3424]: 2014/07/30_13:57:51 info: Retrying failed stop
operation [postgresql]
ResourceManager[3424]: 2014/07/30_13:5
-- System Information:
Debian Release: 7.6
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 3.2.0-4-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages postgresql-9.1 depends on:
ii libc6 2.13-38+deb7u3
ii libcomerr2 1.42.5-1.1
ii libgssapi-krb5-2 1.10.1+dfsg-5+deb7u1
ii libkrb5-3 1.10.1+dfsg-5+deb7u1
ii libldap-2.4-2 2.4.31-1+nmu2
ii libpam0g 1.1.3-7.1
ii libpq5 9.1.13-0wheezy1
ii libssl1.0.0 1.0.1e-2+deb7u11
ii libxml2 2.8.0+dfsg1-7+wheezy1
ii locales 2.13-38+deb7u3
ii postgresql-client-9.1 9.1.13-0wheezy1
ii postgresql-common 134wheezy4
ii ssl-cert 1.0.32
ii tzdata 2014e-0wheezy1
postgresql-9.1 recommends no packages.
Versions of packages postgresql-9.1 suggests:
pn locales-all <none>
pn oidentd | ident-server <none>
-- no debconf information
--- End Message ---
--- Begin Message ---
Re: To Marc Richter 2014-07-31 <20140731103408.ga13...@msg.df7cb.de>
> > Unfortunately, when stoping heartbeat on the standby node, heartbeat
> > nevertheless tries to give up resources, even it hasn't acquired them
> > before. Since /var/lib/postgresql wasn't mounted before on that node,
> > issuing "/etc/init.d/postgresql stop" on the standby node fails, since it
> > cannot find necessary files in /var/lib/postgresql .
>
> The init script was never designed to be a drop-in heartbeat HA agent.
> The exit codes are probably simply wrong in some cases for that.
>
> Any reason you aren't using the pgsql agent provided by pacemaker?
With the switch to systemd, and lots of PostgreSQL agents for HA
resource managers available, fixing the init script for heartbeat
isn't really going to happen anymore. Closing this bug now.
Thanks for the report,
Christoph
--- End Message ---
_______________________________________________
Pkg-postgresql-public mailing list
Pkg-postgresql-public@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-postgresql-public