date:20160815

[GENERAL] 9.2 to 9.5 pg_upgrade losing data

2016-08-15 Thread Pete Fuller

Hello all,

We are attempting to upgrade a production 9.2 postgres cluster to 9.5. The
server is running Centos 7 with the Centos version - currently 9.2.15 We are
installing the postgresql provided rpm of postgresql9.5 from the postgresql
repo, currently 9.5.4. In testing and on our development cluster, the upgrade
worked without issue but something is happening on our production DB that we
can not figure out.

The 9.2 cluster is in streaming replication controlled by pacemaker. Steps we
have been using are the following.

• verify the replica server is in a good state with up to date data,
then put the machine in standby, stopping the 9.2 db
• Removing the recovery.conf file and starting the 9.2 server manually
to get some stats to verify later (counts of rows from several key tables)
• Stop the 9.2 server, (/bin/pg_ctl stop -D /var/lib/pgsql/data ) and
install the 9.5 binaries.
• init the 9.5 db
• run a pg_upgrade with the link option. The full command we are using
is
/usr/pgsql-9.5/bin/pg_upgrade -d /var/lib/pgsql/data -D /var/lib/pgsql/9.5/data
-b /bin -B /usr/pgsql-9.5/bin/ --link -r -v
• run the full vacuum analyze script that pg_upgrade generated

What we are finding in testing is we get thru the upgrade process, start the
9.5 server manually to compare the row counts we collected immediately before
the upgrade and we are seeing different data, sometimes drastically. This data
includes missing user accounts, etc, that is not perishable data.

For example - 9.2 standalone before running pg_upgrade gives this information
tracks=# select count(*) from users;
count
---
13945
(1 row)

tracks=# select count(*) from sessions;
count
---
559
(1 row)

PG 9.5 after the upgrade - same machine, upgraded database

tracks=# select count(*) from sessions;
---
155
(1 row)

tracks=# select count(*) from users;
count
---
13157
(1 row)

We have enabled verbose logging and have reviewed it but are seeing no errors
in the migration. The 9.5 server starts without complaint and if we did not do
these manual checks we would not see a problem. We have attempted this upgrade
on this machine repeatedly in prepping for our maintenance windows, and on
occasion our manual checks have shown good numbers. At this point, though, it
has failed silently enough times that I have little confidence in the data even
if our rudimentary checks do show identical counts after the upgrade.

Has anyone else seen this issue?

The only 'hack' during the upgrade that isn't straight from the
postgres_upgrade man page is to handle a renamed option in postgresql that
cents back ported - unix_socket_directories. Using the rename of pg_ctl that is
referenced in the top answer here to force a couple variables into the centos
provided pg_ctl binary -
http://dba.stackexchange.com/questions/50135/pg-upgrade-unrecognized-configuration-parameter-unix-socket-directory

Any information or suggestions would be helpful at this point.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

37 matches

Mail list logo