On 4 August 2014 19:30, Claudio Freire <klaussfre...@gmail.com> wrote: > On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini > <gabriele.bartol...@2ndquadrant.it> wrote: >> I really like the proposal of working on a block level incremental >> backup feature and the idea of considering LSN. However, I'd suggest >> to see block level as a second step and a goal to keep in mind while >> working on the first step. I believe that file-level incremental >> backup will bring a lot of benefits to our community and users anyway. > > Thing is, I don't see how the LSN method is that much harder than an > on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster. > > Keeping a last-updated-LSN for each segment (or group of blocks) is > just as easy as keeping a bitmap, and far more flexible and robust. > > The complexity and cost of safely keeping the map up-to-date is what's > in question here, but as was pointed before, there's no really safe > alternative. Nor modification times nor checksums (nor in-memory > bitmaps IMV) are really safe enough for backups, so you really want to > use something like the LSN. It's extra work, but opens up a world of > possibilities.
OK, some comments on all of this. * Wikipedia thinks the style of backup envisaged should be called "Incremental" https://en.wikipedia.org/wiki/Differential_backup * Base backups are worthless without WAL right up to the *last* LSN seen during the backup, which is why pg_stop_backup() returns an LSN. This is the LSN that is the effective viewpoint of the whole base backup. So if we wish to get all changes since the last backup, we must re-quote this LSN. (Or put another way - file level LSNs don't make sense - we just need one LSN for the whole backup). * When we take an incremental backup we need the WAL from the backup start LSN through to the backup stop LSN. We do not need the WAL between the last backup stop LSN and the new incremental start LSN. That is a huge amount of WAL in many cases and we'd like to avoid that, I would imagine. (So the space savings aren't just the delta from the main data files, we should also look at WAL savings). * For me, file based incremental is a useful and robust feature. Block-level incremental is possible, but requires either significant persistent metadata (1 MB per GB file) or access to the original backup. One important objective here is to make sure we do NOT have to re-read the last backup when taking the next backup; this helps us to optimize the storage costs for backups. Plus, block-level recovery requires us to have a program that correctly re-writes data into the correct locations in a file, which seems likely to be a slow and bug ridden process to me. Nice, safe, solid file-level incremental backup first please. Fancy, bug prone, block-level stuff much later. * One purpose of this could be to verify the backup. rsync provides a checksum, pg_basebackup does not. However, checksums are frequently prohibitively expensive, so perhaps asking for that is impractical and maybe only a secondary objective. * If we don't want/have file checksums, then we don't need a profile file and using just the LSN seems fine. I don't think we should specify that manually - the correct LSN is written to the backup_label file in a base backup and we should read it back from there. We should also write a backup_label file to incremental base backups, then we can have additional lines saying what the source backups were. So full base backup backup_labels remain as they are now, but we add one additional line per increment, so we have the full set of increments, much like a history file. Normal backup_label files look like this START WAL LOCATION: %X/%X CHECKPOINT LOCATION: %X/%X BACKUP METHOD: streamed BACKUP FROM: standby START TIME: .... LABEL: foo so we would have a file that looks like this START WAL LOCATION: %X/%X CHECKPOINT LOCATION: %X/%X BACKUP METHOD: streamed BACKUP FROM: standby START TIME: .... LABEL: foo INCREMENTAL 1 START WAL LOCATION: %X/%X CHECKPOINT LOCATION: %X/%X BACKUP METHOD: streamed BACKUP FROM: standby START TIME: .... LABEL: foo incremental 1 INCREMENTAL 2 START WAL LOCATION: %X/%X CHECKPOINT LOCATION: %X/%X BACKUP METHOD: streamed BACKUP FROM: standby START TIME: .... LABEL: foo incremental 2 ... etc ... which we interpret as showing the original base backup, then the first increment, then the second increment etc.. which allows us to recover the backups in the correct sequence. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers