[Pkg-postgresql-public] Bug#756606: marked as done (postgresql-9.1: Init-Script does not work together with heartbeat)

Debian Bug Tracking System Fri, 19 May 2017 10:45:27 -0700

Your message dated Fri, 19 May 2017 19:42:39 +0200
with message-id <20170519174239.vje6uaei57hy3...@msg.df7cb.de>
and subject line Re: Bug#756606: [Pkg-postgresql-public] Bug#756606: 
postgresql-9.1: Init-Script does not work together with heartbeat
has caused the Debian Bug report #756606,
regarding postgresql-9.1: Init-Script does not work together with heartbeat
to be marked as done.


This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)


-- 
756606: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=756606
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems

--- Begin Message ---

Package: postgresql-9.1
Version: 9.1.13-0wheezy1
Severity: important

Dear Maintainer,

After drbd, heartbeat and postgresql-9.1 is installed and basically configured, 
the attempt to run postgresql init script from heartbeat's haresources fails in 
multiple ways.
Unfortunately, I cannot even tell why it doesn't work, since I cannot even see 
reasons why this doesn't work in ha-debug log with debuging enabled.

The following lists what's going wrong:

1) A usual case for Postgres HA Clusters is to have /var/lib/postgresql in a 
DRBD synced resource, which is only mounted on one node at a time. When you 
have a resource group configured to start drbddisk, mount /var/lib/postgresql, 
start postgresql (in that order - see haresources - file listed later in this 
report) and start up heartbeat on both nodes, these resources are only started 
on the primary node for this resource group (first field in haresources file). 
These resources are not acquired on the standby - node.
Unfortunately, when stoping heartbeat on the standby node, heartbeat 
nevertheless tries to give up resources, even it hasn't acquired them before. 
Since /var/lib/postgresql wasn't mounted before on that node, issuing 
"/etc/init.d/postgresql stop" on the standby node fails, since it cannot find 
necessary files in /var/lib/postgresql .
Even without having heartbeat STONITH configured, this leads to a hard server 
reset somehow.
Solution: "/etc/init.d/postgresql stop" shouldn't return an error when the 
datadir is empty to make it usable along with heartbeat.

2) When starting heartbeat, it seems like postgresql isn't started at all. I do 
not understand this, since all other init-scripts I have tested (samba, cron) 
are working fine when used instead of postgresql in quoted haresources file 
below.

I have tried this on multiple, clean Debian wheezy installs from Bare metal 
server to workstation VirtualBox setups. The result is always the same.

You find the logs and configurations used following this line:

/etc/ha.d/haresources :

prod-cl3  drbddisk::var_lib_postgres 
Filesystem::/dev/drbd0::/var/lib/postgresql::ext4 IPaddr::192.168.20.18/24/eth0 
postgresql

=======================

/etc/ha.d/ha.cf :

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 30
warntime 10
initdead 90
udpport 694
ucast eth1 10.250.250.16
auto_failback on
node   prod-cl3
node   prod-cl4

=======================

/etc/drbd.conf :

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

=======================

/etc/drbd.d/global_common.conf :

global {
        usage-count yes;
}

common {
        protocol C;

        startup {
                wfc-timeout  15;
                degr-wfc-timeout 120;
        }

        disk {
                on-io-error     detach;
        }

        net {
                after-sb-0pri disconnect;
                after-sb-1pri disconnect;
                after-sb-2pri disconnect;
                rr-conflict disconnect;
        }

        syncer {
                rate 96256;
        }
}

=======================

/etc/drbd.d/prod-cl.res :

resource var_lib_postgres {
        protocol C;
        on prod-cl3 {
                device  /dev/drbd0;
                disk    /dev/prod-cl3_data/var_lib_postgres;
                address 10.250.250.16:7789;
                meta-disk       internal;
        }
        on prod-cl4 {
                device  /dev/drbd0;
                disk    /dev/prod-cl4_data/var_lib_postgres;
                address 10.250.250.17:7789;
                meta-disk       internal;
        }
}

=======================

ha-debug log, showing postgres isn't even started on primary node when 
heartbeat starts:

Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Core dumps could be lost if 
multiple dumps occur.
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Consider setting non-default 
value in /proc/sys/kernel/core_pattern (or equivalent) for maximum 
supportability
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Consider setting 
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: info: Pacemaker support: false
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Logging daemon is disabled 
--enabling logging daemon is recommended
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: info: **************************
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: info: Configuration validated. 
Starting heartbeat 3.0.5
Jul 30 13:51:11 prod-cl3 heartbeat: [20847]: info: heartbeat: version 3.0.5
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: Heartbeat generation: 
1406638883
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: write socket 
priority set to IPTOS_LOWDELAY on eth1
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: bound send 
socket to device: eth1
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: bound receive 
socket to device: eth1
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: started on port 
694 interface eth1 to 10.250.250.17
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: Local status now set to: 'up'
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: WARN: node prod-cl4: is dead
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Comm_now_up(): updating 
status to active
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Local status now set to: 
'active'
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: WARN: No STONITH device configured.
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: WARN: Shared disks are not 
protected.
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Resources being acquired 
from prod-cl4.
Jul 30 13:52:43 prod-cl3 heartbeat: [20876]: debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
harc[20876]:    2014/07/30_13:52:43 info: Running /etc/ha.d//rc.d/status status
Jul 30 13:52:43 prod-cl3 heartbeat: [20877]: info: Local Resource acquisition 
completed.
mach_down[20910]:       2014/07/30_13:52:43 info: 
/usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: debug: StartNextRemoteRscReq(): 
child count 2
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: mach_down takeover complete.
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Initial resource acquisition 
complete (mach_down)
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: debug: StartNextRemoteRscReq(): 
child count 1
mach_down[20910]:       2014/07/30_13:52:43 info: mach_down takeover complete 
for node prod-cl4.
Jul 30 13:52:43 prod-cl3 heartbeat: [20968]: debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
harc[20968]:    2014/07/30_13:52:43 info: Running 
/etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp[20968]: 2014/07/30_13:52:43 received ip-request-resp 
drbddisk::var_lib_postgres OK yes
ResourceManager[20989]: 2014/07/30_13:52:43 info: Acquiring resource group: 
prod-cl3 drbddisk::var_lib_postgres 
Filesystem::/dev/drbd0::/var/lib/postgresql::ext4 192.168.20.18/24/eth0 
postgresql
ResourceManager[20989]: 2014/07/30_13:52:43 info: Running 
/etc/ha.d/resource.d/drbddisk var_lib_postgres start
Filesystem[21057]:      2014/07/30_13:52:43 INFO:  Resource is stopped
ResourceManager[20989]: 2014/07/30_13:52:43 info: Running 
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/postgresql ext4 start
Filesystem[21131]:      2014/07/30_13:52:43 INFO: Running start for /dev/drbd0 
on /var/lib/postgresql
FATAL: Module scsi_hostadapter not found.
Filesystem[21125]:      2014/07/30_13:52:43 INFO:  Success
INFO:  Success
IPaddr[21200]:  2014/07/30_13:52:43 INFO:  Resource is stopped
ResourceManager[20989]: 2014/07/30_13:52:43 info: Running 
/etc/ha.d/resource.d/IPaddr 192.168.20.18/24/eth0 start
IPaddr[21282]:  2014/07/30_13:52:43 INFO: Using calculated netmask for 
192.168.20.18: 255.255.255.0
IPaddr[21282]:  2014/07/30_13:52:43 INFO: eval ifconfig eth0:0 192.168.20.18 
netmask 255.255.255.0 broadcast 192.168.20.255
IPaddr[21258]:  2014/07/30_13:52:43 INFO:  Success
INFO:  Success
Jul 30 13:52:53 prod-cl3 heartbeat: [20847]: info: Local Resource acquisition 
completed. (none)
Jul 30 13:52:53 prod-cl3 heartbeat: [20847]: info: local resource transition 
completed.

=======================

ha-debug log, showing server crash when postgresql isn't properly stoped (due 
to missing files in datadir as described):

Jul 30 13:57:49 prod-cl4 heartbeat: [3340]: info: Heartbeat shutdown in 
progress. (3340)
Jul 30 13:57:49 prod-cl4 heartbeat: [3410]: info: Giving up all HA resources.
ResourceManager[3424]:  2014/07/30_13:57:49 info: Releasing resource group: 
prod-cl3 drbddisk::var_lib_postgres 
Filesystem::/dev/drbd0::/var/lib/postgresql::ext4 192.168.20.18/24/eth0 
postgresql
ResourceManager[3424]:  2014/07/30_13:57:49 info: Running 
/etc/init.d/postgresql  stop
Stopping PostgreSQL 9.1 database server: mainError: 
/var/lib/postgresql/9.1/main is not accessible or does not exist ... failed!
 failed!
ResourceManager[3424]:  2014/07/30_13:57:50 ERROR: Return code 1 from 
/etc/init.d/postgresql
ResourceManager[3424]:  2014/07/30_13:57:51 info: Retrying failed stop 
operation [postgresql]
ResourceManager[3424]:  2014/07/30_13:5


-- System Information:
Debian Release: 7.6
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages postgresql-9.1 depends on:
ii  libc6                  2.13-38+deb7u3
ii  libcomerr2             1.42.5-1.1
ii  libgssapi-krb5-2       1.10.1+dfsg-5+deb7u1
ii  libkrb5-3              1.10.1+dfsg-5+deb7u1
ii  libldap-2.4-2          2.4.31-1+nmu2
ii  libpam0g               1.1.3-7.1
ii  libpq5                 9.1.13-0wheezy1
ii  libssl1.0.0            1.0.1e-2+deb7u11
ii  libxml2                2.8.0+dfsg1-7+wheezy1
ii  locales                2.13-38+deb7u3
ii  postgresql-client-9.1  9.1.13-0wheezy1
ii  postgresql-common      134wheezy4
ii  ssl-cert               1.0.32
ii  tzdata                 2014e-0wheezy1

postgresql-9.1 recommends no packages.

Versions of packages postgresql-9.1 suggests:
pn  locales-all             <none>
pn  oidentd | ident-server  <none>

-- no debconf information

--- End Message ---

--- Begin Message ---

Re: To Marc Richter 2014-07-31 <20140731103408.ga13...@msg.df7cb.de>
> > Unfortunately, when stoping heartbeat on the standby node, heartbeat 
> > nevertheless tries to give up resources, even it hasn't acquired them 
> > before. Since /var/lib/postgresql wasn't mounted before on that node, 
> > issuing "/etc/init.d/postgresql stop" on the standby node fails, since it 
> > cannot find necessary files in /var/lib/postgresql .
> 
> The init script was never designed to be a drop-in heartbeat HA agent.
> The exit codes are probably simply wrong in some cases for that.
> 
> Any reason you aren't using the pgsql agent provided by pacemaker?

With the switch to systemd, and lots of PostgreSQL agents for HA
resource managers available, fixing the init script for heartbeat
isn't really going to happen anymore. Closing this bug now.

Thanks for the report,
Christoph

--- End Message ---

_______________________________________________
Pkg-postgresql-public mailing list
Pkg-postgresql-public@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-postgresql-public

[Pkg-postgresql-public] Bug#756606: marked as done (postgresql-9.1: Init-Script does not work together with heartbeat)

Reply via email to