Hi all, As of today, replication protocol has a command called BASE_BACKUP to allow a client connecting with the replication protocol to retrieve a full backup from server through a connection stream. The description of its current options are here: http://www.postgresql.org/docs/9.3/static/protocol-replication.html
This command is in charge to put the server in start backup by using do_pg_start_backup, then it sends the backup, and finalizes the backup with do_pg_stop_backup. Thanks to that it is as well possible to get backups from even standby nodes as the stream contains the backup_label file necessary for recovery. Full backup is sent in tar format for obvious performance reasons to limit the amount of data sent through the stream, and server contains necessary coding to send the data in correct format. This forces the client as well to perform some decoding if the output of the base backup received needs to be analyzed on the fly but doing something similar to what now pg_basebackup does when the backup format is plain. I would like to propose the following things to extend BASE_BACKUP to retrieve a backup from a stream: - Addition of an option FORMAT, to control the output format of backup, with possible options as 'plain' and 'tar'. Default is tar for backward compatibility purposes. The purpose of this option is to make easier for backup tools playing with postgres to retrieve and backup and analyze it on the fly, the purpose being to filter and analyze the data while it is being received without all the tar decoding necessary, what would consist in copying portions of pg_basebackup code more or less. - Addition of an option called INCREMENTAL to send an incremental backup to the client. This option uses as input an LSN, and sends back to client relation pages (in the shape of reduced relation files) that are newer than the LSN specified by looking at pd_lsn of PageHeaderData. In this case the LSN needs to be determined by client based on the latest full backup taken. This option is particularly interesting to reduce the amount of data taken between two backups, even if it increases the restore time as client needs to reconstitute a base backup depending on the recovery target and the pages modified. Client would be in charge of rebuilding pages from incremental backup by scanning all the blocks that need to be updated based on the full backup as the LSN from which incremental backup is taken is known. But this is not really something the server cares about... Such things are actually done by pg_rman as well. As a next step, I would imagine that pg_basebackup could be extended to take incremental backups as well. Having another tool able to rebuild backups based on a full backup with incremental information would be nice as well. This is of course not material for 9.4, I just would like for now to measure the temperature about such things and gather opinions... Regards -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers