On Fri, Jun 2, 2017 at 4:20 PM, Dmitriy Sarafannikov <dsarafanni...@yandex.ru> wrote: > Thanks for all. > > We found the source of the problem. It was mistake in upgrade to 9.6. > > We upgrade replica with rsync as it is in the documentation: > rsync --verbose --relative --archive --hard-links --size-only old_pgdata > new_pgdata remote_dir > > We must provide 100% read-only availability of our shard (master + 2 > replicas). > So we can’t stop master and both replicas, upgrade them one by one and start > them. > We do it as follows: > Close master from load, stop master, upgrade it, stop 1st replica, upgrade > it, start 1st replica, > stop 2nd replica, upgrade it, start 2nd replica, start master, open master. > But upgraded replicas died under load without statistics and we decided to > perform > analyze on master before upgrading replicas. In this case statistics would be > copied to replicas by rsync. > The upgrade algorithm became as follows: > Close master, stop master, close master from replicas (iptables), upgrade > master, > start master, perform analyze, stop master, stop 1st replica, upgrade 1st > replica, > start 1st replica, stop 2nd replica, upgrade 2nd replica, start 2nd replica, > start master, open master. > > If autovacuum starts vacuuming relations while we are performing analyze, wal > records > generated by it will not be replayed on replicas, because next step is > stopping > master with checkpoint and new redo location LSN (newer that these wal > records) > will appear in pg_control file, which then will be copied by rsync to > replicas. > > If it was simple vacuum, we most likely will not see the consequences. > Because it marks > tuples as deleted, and some of the next new tuples will be placed here, and > due to FPW > replicas will receive correct page, identical to master. > But if it was vacuum to prevent wraparound, we will see situation like ours. > Tuples on > master will be frozen, but on replicas not. And it will not change if nobody > will not > update any tuple on this page. >
Why didn't rsync made the copies on master and replica same? > It’s dangerous, because, if we perform switchover to replica, «corrupted» page > will be delivered to all replicas after next update of any tuple from this > page. > > We reproduced this case in our test environment and this assumption was > confirmed. > > Starting and stopping master after running pg_upgrade but before rsync to > collect statistics > was a bad idea. > > We know how to find such «corrupted» tuples. And we want to fix this by > manually > freezing tuples via calling specially written C functions. Functions are > «copy-pasted» > and simplified code from vacuum functions with SQL interface (see attachment). > Can you look on them? Do you think it is safe to use them for fixing > corrupted pages > or is there a better way not to loose data? > I haven't looked in detail, but it sounds slightly risky proposition to manipulate the tuples by writing C functions of the form you have in your code. I would have preferred some way to avoid this problem by ensuring that replicas are properly synced (complete data of master via WAL) or by disabling autovacuum. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers