On Fri, Jul 25, 2014 at 10:14 PM, Marco Nenciarini <marco.nenciar...@2ndquadrant.it> wrote: > 0. Introduction: > ================================= > This is a proposal for adding incremental backup support to streaming > protocol and hence to pg_basebackup command. Not sure that incremental is a right word as the existing backup methods using WAL archives are already like that. I recall others calling that differential backup from some previous threads. Would that sound better?
> 1. Proposal > ================================= > Our proposal is to introduce the concept of a backup profile. Sounds good. Thanks for looking at that. > The backup > profile consists of a file with one line per file detailing tablespace, > path, modification time, size and checksum. > Using that file the BASE_BACKUP command can decide which file needs to > be sent again and which is not changed. The algorithm should be very > similar to rsync, but since our files are never bigger than 1 GB per > file that is probably granular enough not to worry about copying parts > of files, just whole files. There are actually two levels of differential backups: file-level, which is the approach you are taking, and block level. Block level backup makes necessary a scan of all the blocks of all the relations and take only the data from the blocks newer than the LSN given by the BASE_BACKUP command. In the case of file-level approach, you could already backup the relation file after finding at least one block already modified. Btw, the size of relation files depends on the size defined by --with-segsize when running configure. 1GB is the default though, and the value usually used. Differential backups can reduce the size of overall backups depending on the application, at the cost of some CPU to analyze the relation blocks that need to be included in the backup. > It could also be used in 'refresh' mode, by allowing the pg_basebackup > command to 'refresh' an old backup directory with a new backup. I am not sure this is really helpful... > The final piece of this architecture is a new program called > pg_restorebackup which is able to operate on a "chain of incremental > backups", allowing the user to build an usable PGDATA from them or > executing maintenance operations like verify the checksums or estimate > the final size of recovered PGDATA. Yes, right. Taking a differential backup is not difficult, but rebuilding a constant base backup with a full based backup and a set of differential ones is the tricky part, but you need to be sure that all the pieces of the puzzle are here. > We created a wiki page with all implementation details at > https://wiki.postgresql.org/wiki/Incremental_backup I had a look at that, and I think that you are missing the shot in the way differential backups should be taken. What would be necessary is to pass a WAL position (or LSN, logical sequence number like 0/2000060) with a new clause called DIFFERENTIAL (INCREMENTAL in your first proposal) in the BASE BACKUP command, and then have the server report back to client all the files that contain blocks newer than the given LSN position given for file-level backup, or the blocks newer than the given LSN for the block-level differential backup. Note that we would need a way to identify the type of the backup taken in backup_label, with the LSN position sent with DIFFERENTIAL clause of BASE_BACKUP, by adding a new field in it. When taking a differential backup, the LSN position necessary would be simply the value of START WAL LOCATION of the last differential or full backup taken. This results as well in a new option for pg_basebackup of the type --differential='0/2000060' to take directly a differential backup. Then, for the utility pg_restorebackup, what you would need to do is simply to pass a list of backups to it, then validate if they can build a consistent backup, and build it. Btw, the file-based method would be simpler to implement, especially for rebuilding the backups. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers