Re: block-level incremental backup
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Tue, Sep 17, 2019 at 12:09 PM Stephen Frost wrote: > > We need to be sure that we can detect if the WAL level has ever been set > > to minimal between a full and an incremental and, if so, either refuse > > to run the incremental, or promote it to a full, or make it a > > checksum-based incremental instead of trusting the WAL stream. > > Sure. What about checksum collisions? Certainly possible, of course, but a sha256 of each file is at least somewhat better than, say, our page-level checksums. I do agree that having the option to just say "promote it to a full", or "do a byte-by-byte comparison against the prior backed up file" would be useful for those who are concerned about sha256 collision probabilities. Having a cross-check of "does this X% of files that we decided not to back up due to whatever really still match what we think is in the backup?" is definitely a valuable feature and one which I'd hope we get to at some point. > > Ok.. I can understand that but I don't get how these changes to > > pg_basebackup will help facilitate that. If they don't and what you're > > talking about here is independent, then great, that clarifies things, > > but if you're saying that these changes to pg_basebackup are to help > > with backing up directly into those Enterprise systems then I'm just > > asking for some help in understanding how- what's the use-case here that > > we're adding to pg_basebackup that makes it work with these Enterprise > > systems? > > > > I'm not trying to be difficult here, I'm just trying to understand. > > Man, I feel like we're totally drifting off into the weeds here. I'm > not arguing that these changes to pg_basebackup will help enterprise > users except insofar as those users want incremental backup. All of > this discussion started with this comment from you: > > "Having a system of keeping track of which backups are full and which > are differential in an overall system also gives you the ability to do > things like expiration in a sensible way, including handling WAL > expiration." > > All I was doing was saying that for an enterprise user, the overall > system might be something entirely outside of our control, like > NetBackup or Tivoli. Therefore, whatever functionality we provide to > do that kind of thing should be able to be used in such contexts. That > hardly seems like a controversial proposition. And all I was trying to understand was how what pg_basebackup does in this context is really different from what can be done with pgbackrest, since you brought it up: "True, but I'm not sure that functionality belongs in core. It certainly needs to be possible for out-of-core code to do this part of the work if desired, because people want to integrate with enterprise backup systems, and we can't come in and say, well, you back up everything else using Netbackup or Tivoli, but for PostgreSQL you have to use pg_backrest. I mean, maybe you can win that argument, but I know I can't." What it sounds like you're argueing here is that what pg_basebackup "has" in it is that it specifically doesn't include any kind of expiration management of any kind, and that's somehow helpful to people who want to use Enterprise backup solutions. Maybe that's what you were getting at, in which case, I'm sorry for misunderstanding and dragging it out, and thanks for helping me understand. > > How would that tool work, if it's to be able to work regardless of where > > the WAL is actually stored..? Today, pg_archivecleanup just works > > against a POSIX filesystem- are you thinking that the tool would have a > > pluggable storage system, so that it could work with, say, a POSIX > > filesystem, or a CIFS mount, or a s3-like system? > > Again, I was making a general statement about design goals -- "we > should try to work nicely with enterprise backup products" -- not > proposing a specific design for a specific thing. I don't think the > idea of some pluggability in that area is a bad one, but it's not even > slightly what this thread is about. Well, I agree with you, as I said up-thread, that this seemed to be going in a different and perhaps not entirely relevant direction. > > Provided the WAL level is at the level that you need it to be that will > > be true for things which are actually supported with PITR, replication > > to standby servers, et al. I can see how it might come across as an > > overreaction but this strikes me as a pretty glaring issue and I worry > > that if it was overlooked until now that there'll be other more subtle > > issues, and backups are just plain complicated to get right, just to > > begin with already, something that I don't think people appreciate until > > they've been dealing with them for quite a while. > > Permit me to be unpersuaded. If it was such a glaring issue, and if > experience is the key to spotting such issues, then why didn't YOU > spot it? I'm not designing the f
Re: block-level incremental backup
On Tue, Sep 17, 2019 at 12:09 PM Stephen Frost wrote: > We need to be sure that we can detect if the WAL level has ever been set > to minimal between a full and an incremental and, if so, either refuse > to run the incremental, or promote it to a full, or make it a > checksum-based incremental instead of trusting the WAL stream. Sure. What about checksum collisions? > Just to be clear, I see your points and I like the general idea of > finding solutions, but it seems like the issues are likely to be pretty > complex and I'm not sure that's being appreciated very well. Definitely possible, but it's more helpful if you can point out the actual issues. > Ok.. I can understand that but I don't get how these changes to > pg_basebackup will help facilitate that. If they don't and what you're > talking about here is independent, then great, that clarifies things, > but if you're saying that these changes to pg_basebackup are to help > with backing up directly into those Enterprise systems then I'm just > asking for some help in understanding how- what's the use-case here that > we're adding to pg_basebackup that makes it work with these Enterprise > systems? > > I'm not trying to be difficult here, I'm just trying to understand. Man, I feel like we're totally drifting off into the weeds here. I'm not arguing that these changes to pg_basebackup will help enterprise users except insofar as those users want incremental backup. All of this discussion started with this comment from you: "Having a system of keeping track of which backups are full and which are differential in an overall system also gives you the ability to do things like expiration in a sensible way, including handling WAL expiration." All I was doing was saying that for an enterprise user, the overall system might be something entirely outside of our control, like NetBackup or Tivoli. Therefore, whatever functionality we provide to do that kind of thing should be able to be used in such contexts. That hardly seems like a controversial proposition. > How would that tool work, if it's to be able to work regardless of where > the WAL is actually stored..? Today, pg_archivecleanup just works > against a POSIX filesystem- are you thinking that the tool would have a > pluggable storage system, so that it could work with, say, a POSIX > filesystem, or a CIFS mount, or a s3-like system? Again, I was making a general statement about design goals -- "we should try to work nicely with enterprise backup products" -- not proposing a specific design for a specific thing. I don't think the idea of some pluggability in that area is a bad one, but it's not even slightly what this thread is about. > Provided the WAL level is at the level that you need it to be that will > be true for things which are actually supported with PITR, replication > to standby servers, et al. I can see how it might come across as an > overreaction but this strikes me as a pretty glaring issue and I worry > that if it was overlooked until now that there'll be other more subtle > issues, and backups are just plain complicated to get right, just to > begin with already, something that I don't think people appreciate until > they've been dealing with them for quite a while. Permit me to be unpersuaded. If it was such a glaring issue, and if experience is the key to spotting such issues, then why didn't YOU spot it? I'm not arguing that this stuff isn't hard. It is. Nor am I arguing that I didn't screw up. I did. But designs need to be accepted or rejected based on facts, not FUD. You've raised some good technical points and if you've got more concerns, I'd like to hear them, but I don't think arguing vaguely that a certain approach will probably run into trouble gets us anywhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Sep 16, 2019 at 3:38 PM Stephen Frost wrote: > > As discussed nearby, not everything that needs to be included in the > > backup is actually going to be in the WAL though, right? How would that > > ever be able to handle the case where someone starts the server under > > wal_level = logical, takes a full backup, then restarts with wal_level = > > minimal, writes out a bunch of new data, and then restarts back to > > wal_level = logical and takes an incremental? > > Fair point. I think the WAL-scanning approach can only work if > wal_level > minimal. But, I also think that few people run with > wal_level = minimal in this era where the default has been changed to > replica; and I think we can detect the WAL level in use while scanning > WAL. It can only change at a checkpoint. We need to be sure that we can detect if the WAL level has ever been set to minimal between a full and an incremental and, if so, either refuse to run the incremental, or promote it to a full, or make it a checksum-based incremental instead of trusting the WAL stream. I'm also glad that we ended up changing the default though and I do hope that there's relatively few people running with minimal and that there's even fewer who play around with flipping it back and forth. > > On larger systems, so many of the files are 1GB in size that checking > > the file size is quite close to meaningless. Yes, having to checksum > > all of the files definitely adds to the cost of taking the backup, but > > to avoid it we need strong assurances that a given file hasn't been > > changed since our last full backup. WAL, today at least, isn't quite > > that, and timestamps can possibly be fooled with, so if you'd like to be > > particularly careful, there doesn't seem to be a lot of alternatives. > > I see your points, but it feels like you're trying to talk down the > WAL-based approach over what seem to me to be fairly manageable corner > cases. Just to be clear, I see your points and I like the general idea of finding solutions, but it seems like the issues are likely to be pretty complex and I'm not sure that's being appreciated very well. > > I'm not asking you to be an expert on those systems, just to help me > > understand the statements you're making. How is backing up to a > > pgbackrest repo different than running a pg_basebackup in the context of > > using some other Enterprise backup system? In both cases, you'll have a > > full copy of the backup (presumably compressed) somewhere out on a disk > > or filesystem which is then backed up by the Enterprise tool. > > Well, I think that what people really want is to be able to backup > straight into the enterprise tool, without an intermediate step. Ok.. I can understand that but I don't get how these changes to pg_basebackup will help facilitate that. If they don't and what you're talking about here is independent, then great, that clarifies things, but if you're saying that these changes to pg_basebackup are to help with backing up directly into those Enterprise systems then I'm just asking for some help in understanding how- what's the use-case here that we're adding to pg_basebackup that makes it work with these Enterprise systems? I'm not trying to be difficult here, I'm just trying to understand. > My basic point here is: As with practically all PostgreSQL > development, I think we should try to expose capabilities and avoid > making policy on behalf of users. > > I'm not objecting to the idea of having tools that can help users > figure out how much WAL they need to retain -- but insofar as we can > do it, such tools should work regardless of where that WAL is actually > stored. How would that tool work, if it's to be able to work regardless of where the WAL is actually stored..? Today, pg_archivecleanup just works against a POSIX filesystem- are you thinking that the tool would have a pluggable storage system, so that it could work with, say, a POSIX filesystem, or a CIFS mount, or a s3-like system? > I dislike the idea that PostgreSQL would provide something > akin to a "pgbackrest repository" in core, or I at least I think it > would be important that we're careful about how much functionality > gets tied to the presence and use of such a thing, because, at least > based on my experience working at EnterpriseDB, larger customers often > don't want to do it that way. This seems largely independent of the above discussion, but since we're discussing it, I've certainly had various experiences in this area too- some larger customers would like to use an s3-like store (which pgbackrest already supports and will be supporting others going forward as it has a pluggable storage mechanism for the repo...), and then there's customers who would like to point their Enterprise backup solution at a directory on disk to back it up (which pgbackrest also supports, as mentioned previously), and lastly there's cus
Re: block-level incremental backup
On Mon, Sep 16, 2019 at 3:38 PM Stephen Frost wrote: > As discussed nearby, not everything that needs to be included in the > backup is actually going to be in the WAL though, right? How would that > ever be able to handle the case where someone starts the server under > wal_level = logical, takes a full backup, then restarts with wal_level = > minimal, writes out a bunch of new data, and then restarts back to > wal_level = logical and takes an incremental? Fair point. I think the WAL-scanning approach can only work if wal_level > minimal. But, I also think that few people run with wal_level = minimal in this era where the default has been changed to replica; and I think we can detect the WAL level in use while scanning WAL. It can only change at a checkpoint. > On larger systems, so many of the files are 1GB in size that checking > the file size is quite close to meaningless. Yes, having to checksum > all of the files definitely adds to the cost of taking the backup, but > to avoid it we need strong assurances that a given file hasn't been > changed since our last full backup. WAL, today at least, isn't quite > that, and timestamps can possibly be fooled with, so if you'd like to be > particularly careful, there doesn't seem to be a lot of alternatives. I see your points, but it feels like you're trying to talk down the WAL-based approach over what seem to me to be fairly manageable corner cases. > I'm not asking you to be an expert on those systems, just to help me > understand the statements you're making. How is backing up to a > pgbackrest repo different than running a pg_basebackup in the context of > using some other Enterprise backup system? In both cases, you'll have a > full copy of the backup (presumably compressed) somewhere out on a disk > or filesystem which is then backed up by the Enterprise tool. Well, I think that what people really want is to be able to backup straight into the enterprise tool, without an intermediate step. My basic point here is: As with practically all PostgreSQL development, I think we should try to expose capabilities and avoid making policy on behalf of users. I'm not objecting to the idea of having tools that can help users figure out how much WAL they need to retain -- but insofar as we can do it, such tools should work regardless of where that WAL is actually stored. I dislike the idea that PostgreSQL would provide something akin to a "pgbackrest repository" in core, or I at least I think it would be important that we're careful about how much functionality gets tied to the presence and use of such a thing, because, at least based on my experience working at EnterpriseDB, larger customers often don't want to do it that way. > That's not great, of course, which is why there are trade-offs to be > made, one of which typically involves using timestamps, but doing so > quite carefully, to perform the file exclusion. Other ideas are great > but it seems like WAL isn't really a great idea unless we make some > changes there and we, as in PG, haven't got a robust "we know this file > changed as of this point" to work from. I worry that we're putting too > much faith into a system to do something independent of what it was > actually built and designed to do, and thinking that because we could > trust it for X, we can trust it for Y. That seems like a considerable overreaction to me based on the problems reported thus far. The fact is, WAL was originally intended for crash recovery and has subsequently been generalized to be usable for point-in-time recovery, standby servers, and logical decoding. It's clearly established at this point as the canonical way that you know what in the database has changed, which is the same need that we have for incremental backup. At any rate, the same criticism can be leveled - IMHO with a lot more validity - at timestamps. Last-modification timestamps are completely outside of our control; they are owned by the OS and various operating systems can and do have varying behavior. They can go backwards when things have changed; they can go forwards when things have not changed. They were clearly not intended to meet this kind of requirement. Even, they were intended for that purpose much less so than WAL, which was actually designed for a requirement in this general ballpark, if not this thing precisely. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Mon, Sep 16, 2019 at 7:00 PM Robert Haas wrote: > > On Mon, Sep 16, 2019 at 4:31 AM Amit Kapila wrote: > > This seems to be a blocking problem for the LSN based design. > > Well, only the simplest version of it, I think. > > > Can we think of using creation time for file? Basically, if the file > > creation time is later than backup-labels "START TIME:", then include > > that file entirely. I think one big point against this is clock skew > > like what if somebody tinkers with the clock. And also, this can > > cover cases like > > what Jeevan has pointed but might not cover other cases which we found > > problematic. > > Well that would mean, for example, that if you copied the data > directory from one machine to another, the next "incremental" backup > would turn into a full backup. That sucks. And in other situations, > like resetting the clock, it could mean that you end up with a corrupt > backup without any real ability for PostgreSQL to detect it. I'm not > saying that it is impossible to create a practically useful system > based on file time stamps, but I really don't like it. > > > I think the operations covered by WAL flag XLR_SPECIAL_REL_UPDATE will > > have similar problems. > > I'm not sure quite what you mean by that. Can you elaborate? It > appears to me that the XLR_SPECIAL_REL_UPDATE operations are all > things that create files, remove files, or truncate files, and the > sketch in my previous email would handle the first two of those cases > correctly. See below for the third. > > > One related point is how do incremental backups handle the case where > > vacuum truncates the relation partially? Basically, with current > > patch/design, it doesn't appear that such information can be passed > > via incremental backup. I am not sure if this is a problem, but it > > would be good if we can somehow handle this. > > As to this, if you're taking a full backup of a particular file, > there's no problem. If you're taking a partial backup of a particular > file, you need to include the current length of the file and the > identity and contents of each modified block. Then you're fine. > Right, this should address that point. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: block-level incremental backup
On Mon, Sep 16, 2019 at 11:09 PM Stephen Frost wrote: > > Greetings, > > * Robert Haas (robertmh...@gmail.com) wrote: > > On Mon, Sep 16, 2019 at 9:30 AM Robert Haas wrote: > > > > Isn't some operations where at the end we directly call heap_sync > > > > without writing WAL will have a similar problem as well? > > > > > > Maybe. Can you give an example? > > > > Looking through the code, I found two cases where we do this. One is > > a bulk insert operation with wal_level = minimal, and the other is > > CLUSTER or VACUUM FULL with wal_level = minimal. In both of these > > cases we are generating new blocks whose LSNs will be 0/0. So, I think > > we need a rule that if the server is asked to back up all blocks in a > > file with LSNs > some threshold LSN, it must also include any blocks > > whose LSN is 0/0. Those blocks are either uninitialized or are > > populated without WAL logging, so they always need to be copied. > > I'm not sure I see a way around it but this seems pretty unfortunate- > every single incremental backup will have all of those included even > though the full backup likely also does > Yeah, this is quite unfortunate. One more thing to note is that the same is true for other operation like 'create index' (ex. nbtree bypasses buffer manager while creating the index, doesn't write wal for wal_level=minimal and then syncs at the end). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: block-level incremental backup
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Sep 16, 2019 at 1:10 PM Stephen Frost wrote: > > I disagree with this on a couple of levels. The first is pretty simple- > > we don't have all of the information. The user may have some reason to > > believe that timestamp-based is a bad idea, for example, and therefore > > having an option to perform a checksum-based backup makes sense. rsync > > is a pretty good tool in my view and it has a very similar option- > > because there are trade-offs to be made. LSN is great, if you don't > > mind reading every file of your database start-to-finish every time, but > > in a running system which hasn't suffered from clock skew or other odd > > issues (some of which we can also detect), it's pretty painful to scan > > absolutely everything like that for an incremental. > > There's a separate thread on using WAL-scanning to avoid having to > scan all the data every time. I pointed it out to you early in this > thread, too. As discussed nearby, not everything that needs to be included in the backup is actually going to be in the WAL though, right? How would that ever be able to handle the case where someone starts the server under wal_level = logical, takes a full backup, then restarts with wal_level = minimal, writes out a bunch of new data, and then restarts back to wal_level = logical and takes an incremental? How would we even detect that such a thing happened? > > If you track the checksum of the file in the manifest then it's a pretty > > strong validation that the backup repo hasn't been corrupted between the > > backup and the restore. Of course, the database could have been > > corrupted at the source, and perhaps that's what you were getting at > > with your 'limited extent' but that isn't what I was referring to. > > Yeah, that all seems fair. Without the checksum, you can only validate > that you have the right files and that they are the right sizes, which > is not bad, but the checksums certainly make it stronger. But, > wouldn't having to checksum all of the files add significantly to the > cost of taking the backup? If so, I can imagine that some people might > want to pay that cost but others might not. If it's basically free to > checksum the data while we have it in memory anyway, then I guess > there's little to be lost. On larger systems, so many of the files are 1GB in size that checking the file size is quite close to meaningless. Yes, having to checksum all of the files definitely adds to the cost of taking the backup, but to avoid it we need strong assurances that a given file hasn't been changed since our last full backup. WAL, today at least, isn't quite that, and timestamps can possibly be fooled with, so if you'd like to be particularly careful, there doesn't seem to be a lot of alternatives. > > I'm pretty baffled by this argument, particularly in this context. We > > already have tooling around trying to manage WAL archives in core- see > > pg_archivecleanup. Further, we're talking about pg_basebackup here, not > > about Netbackup or Tivoli, and the results of a pg_basebackup (that is, > > a set of tar files, or a data directory) could happily be backed up > > using whatever Enterprise tool folks want to use- in much the same way > > that a pgbackrest repo is also able to be backed up using whatever > > Enterprise tools someone wishes to use. We designed it quite carefully > > to work with exactly that use-case, so the distinction here is quite > > lost on me. Perhaps you could clarify what use-case these changes to > > pg_basebackup solve, when working with a Netbackup or Tivoli system, > > that pgbackrest doesn't, since you bring it up here? > > I'm not an expert on any of those systems, but I doubt that > everybody's OK with backing everything up to a pgbackrest repository > and then separately backing up that repository to some other system. > That sounds like a pretty large storage cost. I'm not asking you to be an expert on those systems, just to help me understand the statements you're making. How is backing up to a pgbackrest repo different than running a pg_basebackup in the context of using some other Enterprise backup system? In both cases, you'll have a full copy of the backup (presumably compressed) somewhere out on a disk or filesystem which is then backed up by the Enterprise tool. > > As for if we should be sending more to the server, or asking the server > > to send more to us, I don't really have a good feel for what's "best". > > At least one implementation I'm familiar with builds a manifest on the > > PG server side and then compares the results of that to the manifest > > stored with the backup (where that comparison is actually done is on > > whatever system the "backup" was started from, typically a backup > > server). Perhaps there's an argument for sending the manifest from the > > backup repository to PostgreSQL for it to then compare against the data > > directory bu
Re: block-level incremental backup
On Mon, Sep 16, 2019 at 1:10 PM Stephen Frost wrote: > I disagree with this on a couple of levels. The first is pretty simple- > we don't have all of the information. The user may have some reason to > believe that timestamp-based is a bad idea, for example, and therefore > having an option to perform a checksum-based backup makes sense. rsync > is a pretty good tool in my view and it has a very similar option- > because there are trade-offs to be made. LSN is great, if you don't > mind reading every file of your database start-to-finish every time, but > in a running system which hasn't suffered from clock skew or other odd > issues (some of which we can also detect), it's pretty painful to scan > absolutely everything like that for an incremental. There's a separate thread on using WAL-scanning to avoid having to scan all the data every time. I pointed it out to you early in this thread, too. > If you track the checksum of the file in the manifest then it's a pretty > strong validation that the backup repo hasn't been corrupted between the > backup and the restore. Of course, the database could have been > corrupted at the source, and perhaps that's what you were getting at > with your 'limited extent' but that isn't what I was referring to. Yeah, that all seems fair. Without the checksum, you can only validate that you have the right files and that they are the right sizes, which is not bad, but the checksums certainly make it stronger. But, wouldn't having to checksum all of the files add significantly to the cost of taking the backup? If so, I can imagine that some people might want to pay that cost but others might not. If it's basically free to checksum the data while we have it in memory anyway, then I guess there's little to be lost. > I'm pretty baffled by this argument, particularly in this context. We > already have tooling around trying to manage WAL archives in core- see > pg_archivecleanup. Further, we're talking about pg_basebackup here, not > about Netbackup or Tivoli, and the results of a pg_basebackup (that is, > a set of tar files, or a data directory) could happily be backed up > using whatever Enterprise tool folks want to use- in much the same way > that a pgbackrest repo is also able to be backed up using whatever > Enterprise tools someone wishes to use. We designed it quite carefully > to work with exactly that use-case, so the distinction here is quite > lost on me. Perhaps you could clarify what use-case these changes to > pg_basebackup solve, when working with a Netbackup or Tivoli system, > that pgbackrest doesn't, since you bring it up here? I'm not an expert on any of those systems, but I doubt that everybody's OK with backing everything up to a pgbackrest repository and then separately backing up that repository to some other system. That sounds like a pretty large storage cost. > As for if we should be sending more to the server, or asking the server > to send more to us, I don't really have a good feel for what's "best". > At least one implementation I'm familiar with builds a manifest on the > PG server side and then compares the results of that to the manifest > stored with the backup (where that comparison is actually done is on > whatever system the "backup" was started from, typically a backup > server). Perhaps there's an argument for sending the manifest from the > backup repository to PostgreSQL for it to then compare against the data > directory but I'm not really sure how it could possibly do that more > efficiently and that's moving work to the PG server that it doesn't > really need to do. I agree with all that, but... if the server builds a manifest on the PG server that is to be compared with the backup's manifest, the one the PG server builds can't really include checksums, I think. To get the checksums, it would have to read the entire cluster while building the manifest, which sounds insane. Presumably it would have to build a checksum-free version of the manifest, and then the client could checksum the files as they're streamed down and write out a revised manifest that adds the checksums. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Sep 16, 2019 at 9:30 AM Robert Haas wrote: > > > Isn't some operations where at the end we directly call heap_sync > > > without writing WAL will have a similar problem as well? > > > > Maybe. Can you give an example? > > Looking through the code, I found two cases where we do this. One is > a bulk insert operation with wal_level = minimal, and the other is > CLUSTER or VACUUM FULL with wal_level = minimal. In both of these > cases we are generating new blocks whose LSNs will be 0/0. So, I think > we need a rule that if the server is asked to back up all blocks in a > file with LSNs > some threshold LSN, it must also include any blocks > whose LSN is 0/0. Those blocks are either uninitialized or are > populated without WAL logging, so they always need to be copied. I'm not sure I see a way around it but this seems pretty unfortunate- every single incremental backup will have all of those included even though the full backup likely also does (I say likely since someone could do a full backup, set the WAL to minimal, load a bunch of data, and then restart back to a WAL level where we can do a new backup, and then do an incremental, so we don't *know* that the full includes those blocks unless we also track a block-level checksum or similar). Then again, doing these kinds of server bounces to change the WAL level around is, hopefully, relatively rare.. > Outside of unlogged and temporary tables, I don't know of any case > where make a critical modification to an already-existing block > without bumping the LSN. I hope there is no such case. I believe we all do. :) Thanks, Stephen signature.asc Description: PGP signature
Re: block-level incremental backup
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Sep 16, 2019 at 10:38 AM Stephen Frost wrote: > > In a number of cases, trying to make sure that on a failover or copy of > > the backup the next 'incremental' is really an 'incremental' is > > dangerous. A better strategy to address this, and the other issues > > realized on this thread recently, is to: > > > > - Have a manifest of every file in each backup > > - Always back up new files that weren't in the prior backup > > - Keep a checksum of each file > > - Track the timestamp of each file as of when it was backed up > > - Track the file size of each file > > - Track the starting timestamp of each backup > > - Always include files with a modification time after the starting > > timestamp of the prior backup, or if the file size has changed > > - In the event of any anomolies (which includes things like a timeline > > switch), use checksum matching (aka 'delta checksum backup') to > > perform the backup instead of using timestamps (or just always do that > > if you want to be particularly careful- having an option for it is > > great) > > - Probably other things I'm not thinking of off-hand, but this is at > > least a good start. Make sure to checksum this information too. > > I agree with some of these ideas but not all of them. I think having > a backup manifest is a good idea; that would allow taking a new > incremental backup to work from the manifest rather than the data > directory, which could be extremely useful, because it might be a lot > faster and the manifest could also be copied to a machine other than > the one where the entire backup is stored. If the backup itself has > been pushed off to S3 or whatever, you can't access it quickly, but > you could keep the manifest around. Yes, those are also good reasons for having a manifest. > I also agree that backing up all files that weren't in the previous > backup is a good strategy. I proposed that fairly explicitly a few > emails back; but also, the contrary is obviously nonsense. And I also > agree with, and proposed, that we record the size along with the file. Sure, I didn't mean to imply that there was something wrong with that. Including the checksum and other metadata is also valuable, both for helping to identify corruption in the backup archive and for forensics, if not for other reasons. > I don't really agree with your comments about checksums and > timestamps. I think that, if possible, there should be ONE method of > determining whether a block has changed in some important way, and I > think if we can make LSN work, that would be for the best. If you use > multiple methods of detecting changes without any clearly-defined > reason for so doing, maybe what you're saying is that you don't really > believe that any of the methods are reliable but if we throw the > kitchen sink at the problem it should come out OK. Any bugs in one > mechanism are likely to be masked by one of the others, but that's not > as as good as one method that is known to be altogether reliable. I disagree with this on a couple of levels. The first is pretty simple- we don't have all of the information. The user may have some reason to believe that timestamp-based is a bad idea, for example, and therefore having an option to perform a checksum-based backup makes sense. rsync is a pretty good tool in my view and it has a very similar option- because there are trade-offs to be made. LSN is great, if you don't mind reading every file of your database start-to-finish every time, but in a running system which hasn't suffered from clock skew or other odd issues (some of which we can also detect), it's pretty painful to scan absolutely everything like that for an incremental. Perhaps the discussion has already moved on to having some way of our own to track if a given file has changed without having to scan all of it- if so, that's a discussion I'd be interested in. I'm not against other approaches here besides timestamps if there's a solid reason why they're better and they're also able to avoid scanning the entire database. > > By having a manifest for each backed up file for each backup, you also > > gain the ability to validate that a backup in the repository hasn't been > > corrupted post-backup, a feature that at least some other database > > backup and restore systems have (referring specifically to the big O in > > this particular case, but I bet others do too). > > Agreed. The manifest only lets you validate to a limited extent, but > that's still useful. If you track the checksum of the file in the manifest then it's a pretty strong validation that the backup repo hasn't been corrupted between the backup and the restore. Of course, the database could have been corrupted at the source, and perhaps that's what you were getting at with your 'limited extent' but that isn't what I was referring to. Claiming that the backup has been 'validated' by only looking at
Re: block-level incremental backup
On Mon, Sep 16, 2019 at 10:38 AM Stephen Frost wrote: > In a number of cases, trying to make sure that on a failover or copy of > the backup the next 'incremental' is really an 'incremental' is > dangerous. A better strategy to address this, and the other issues > realized on this thread recently, is to: > > - Have a manifest of every file in each backup > - Always back up new files that weren't in the prior backup > - Keep a checksum of each file > - Track the timestamp of each file as of when it was backed up > - Track the file size of each file > - Track the starting timestamp of each backup > - Always include files with a modification time after the starting > timestamp of the prior backup, or if the file size has changed > - In the event of any anomolies (which includes things like a timeline > switch), use checksum matching (aka 'delta checksum backup') to > perform the backup instead of using timestamps (or just always do that > if you want to be particularly careful- having an option for it is > great) > - Probably other things I'm not thinking of off-hand, but this is at > least a good start. Make sure to checksum this information too. I agree with some of these ideas but not all of them. I think having a backup manifest is a good idea; that would allow taking a new incremental backup to work from the manifest rather than the data directory, which could be extremely useful, because it might be a lot faster and the manifest could also be copied to a machine other than the one where the entire backup is stored. If the backup itself has been pushed off to S3 or whatever, you can't access it quickly, but you could keep the manifest around. I also agree that backing up all files that weren't in the previous backup is a good strategy. I proposed that fairly explicitly a few emails back; but also, the contrary is obviously nonsense. And I also agree with, and proposed, that we record the size along with the file. I don't really agree with your comments about checksums and timestamps. I think that, if possible, there should be ONE method of determining whether a block has changed in some important way, and I think if we can make LSN work, that would be for the best. If you use multiple methods of detecting changes without any clearly-defined reason for so doing, maybe what you're saying is that you don't really believe that any of the methods are reliable but if we throw the kitchen sink at the problem it should come out OK. Any bugs in one mechanism are likely to be masked by one of the others, but that's not as as good as one method that is known to be altogether reliable. > By having a manifest for each backed up file for each backup, you also > gain the ability to validate that a backup in the repository hasn't been > corrupted post-backup, a feature that at least some other database > backup and restore systems have (referring specifically to the big O in > this particular case, but I bet others do too). Agreed. The manifest only lets you validate to a limited extent, but that's still useful. > Having a system of keeping track of which backups are full and which are > differential in an overall system also gives you the ability to do > things like expiration in a sensible way, including handling WAL > expiration. True, but I'm not sure that functionality belongs in core. It certainly needs to be possible for out-of-core code to do this part of the work if desired, because people want to integrate with enterprise backup systems, and we can't come in and say, well, you back up everything else using Netbackup or Tivoli, but for PostgreSQL you have to use pg_backrest. I mean, maybe you can win that argument, but I know I can't. > I'd like to clarify that while I would like to have an easier way to > parallelize backups, that's a relatively minor complaint- the much > bigger issue that I have with this feature is that trying to address > everything correctly while having only the amount of information that > could be passed on the command-line about the prior full/incremental is > going to be extremely difficult, complicated, and likely to lead to > subtle bugs in the actual code, and probably less than subtle bugs in > how users end up using it, since they'll have to implement the > expiration and tracking of information between backups themselves > (unless something's changed in that part during this discussion- I admit > that I've not read every email in this thread). Well, the evidence seems to show that you are right, at least to some extent. I consider it a positive good if the client needs to give the server only a limited amount of information. After all, you could always take an incremental backup by shipping every byte of the previous backup to the server, having it compare everything to the current contents, and having it then send you back the stuff that is new or different. But that would be dumb, because most of the point of an incremental backup is to save on send
Re: block-level incremental backup
On Mon, Sep 16, 2019 at 9:30 AM Robert Haas wrote: > > Isn't some operations where at the end we directly call heap_sync > > without writing WAL will have a similar problem as well? > > Maybe. Can you give an example? Looking through the code, I found two cases where we do this. One is a bulk insert operation with wal_level = minimal, and the other is CLUSTER or VACUUM FULL with wal_level = minimal. In both of these cases we are generating new blocks whose LSNs will be 0/0. So, I think we need a rule that if the server is asked to back up all blocks in a file with LSNs > some threshold LSN, it must also include any blocks whose LSN is 0/0. Those blocks are either uninitialized or are populated without WAL logging, so they always need to be copied. Outside of unlogged and temporary tables, I don't know of any case where make a critical modification to an already-existing block without bumping the LSN. I hope there is no such case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Sep 16, 2019 at 4:31 AM Amit Kapila wrote: > > Can we think of using creation time for file? Basically, if the file > > creation time is later than backup-labels "START TIME:", then include > > that file entirely. I think one big point against this is clock skew > > like what if somebody tinkers with the clock. And also, this can > > cover cases like > > what Jeevan has pointed but might not cover other cases which we found > > problematic. > > Well that would mean, for example, that if you copied the data > directory from one machine to another, the next "incremental" backup > would turn into a full backup. That sucks. And in other situations, > like resetting the clock, it could mean that you end up with a corrupt > backup without any real ability for PostgreSQL to detect it. I'm not > saying that it is impossible to create a practically useful system > based on file time stamps, but I really don't like it. In a number of cases, trying to make sure that on a failover or copy of the backup the next 'incremental' is really an 'incremental' is dangerous. A better strategy to address this, and the other issues realized on this thread recently, is to: - Have a manifest of every file in each backup - Always back up new files that weren't in the prior backup - Keep a checksum of each file - Track the timestamp of each file as of when it was backed up - Track the file size of each file - Track the starting timestamp of each backup - Always include files with a modification time after the starting timestamp of the prior backup, or if the file size has changed - In the event of any anomolies (which includes things like a timeline switch), use checksum matching (aka 'delta checksum backup') to perform the backup instead of using timestamps (or just always do that if you want to be particularly careful- having an option for it is great) - Probably other things I'm not thinking of off-hand, but this is at least a good start. Make sure to checksum this information too. I agree entirely that it is dangerous to simply rely on creation time as compared to some other time, or to rely on modification time of a given file across multiple backups (which has been shown to reliably cause corruption, at least with rsync and its 1-second granularity on modification time). By having a manifest for each backed up file for each backup, you also gain the ability to validate that a backup in the repository hasn't been corrupted post-backup, a feature that at least some other database backup and restore systems have (referring specifically to the big O in this particular case, but I bet others do too). Having a system of keeping track of which backups are full and which are differential in an overall system also gives you the ability to do things like expiration in a sensible way, including handling WAL expiration. As also mentioned up-thread, this likely also allows you to have a simpler approach to parallelizing the overall backup. I'd like to clarify that while I would like to have an easier way to parallelize backups, that's a relatively minor complaint- the much bigger issue that I have with this feature is that trying to address everything correctly while having only the amount of information that could be passed on the command-line about the prior full/incremental is going to be extremely difficult, complicated, and likely to lead to subtle bugs in the actual code, and probably less than subtle bugs in how users end up using it, since they'll have to implement the expiration and tracking of information between backups themselves (unless something's changed in that part during this discussion- I admit that I've not read every email in this thread). > > One related point is how do incremental backups handle the case where > > vacuum truncates the relation partially? Basically, with current > > patch/design, it doesn't appear that such information can be passed > > via incremental backup. I am not sure if this is a problem, but it > > would be good if we can somehow handle this. > > As to this, if you're taking a full backup of a particular file, > there's no problem. If you're taking a partial backup of a particular > file, you need to include the current length of the file and the > identity and contents of each modified block. Then you're fine. I would also expect this to be fine but if there's an example of where this is an issue, please share. The only issue that I can think of off-hand is orphaned-file risk, whereby you have something like CREATE DATABASE or perhaps ALTER TABLE .. SET TABLESPACE or such, take a backup while that's happening, but that doesn't complete during the backup (or recovery, or perhaps even in some other scenarios, it's unfortunately quite complicated). This orphaned file risk isn't newly discovered but fixing it is pretty complicated- would love to discuss ideas around how to handle it. > > Isn't so
Re: block-level incremental backup
On Mon, Sep 16, 2019 at 4:31 AM Amit Kapila wrote: > This seems to be a blocking problem for the LSN based design. Well, only the simplest version of it, I think. > Can we think of using creation time for file? Basically, if the file > creation time is later than backup-labels "START TIME:", then include > that file entirely. I think one big point against this is clock skew > like what if somebody tinkers with the clock. And also, this can > cover cases like > what Jeevan has pointed but might not cover other cases which we found > problematic. Well that would mean, for example, that if you copied the data directory from one machine to another, the next "incremental" backup would turn into a full backup. That sucks. And in other situations, like resetting the clock, it could mean that you end up with a corrupt backup without any real ability for PostgreSQL to detect it. I'm not saying that it is impossible to create a practically useful system based on file time stamps, but I really don't like it. > I think the operations covered by WAL flag XLR_SPECIAL_REL_UPDATE will > have similar problems. I'm not sure quite what you mean by that. Can you elaborate? It appears to me that the XLR_SPECIAL_REL_UPDATE operations are all things that create files, remove files, or truncate files, and the sketch in my previous email would handle the first two of those cases correctly. See below for the third. > One related point is how do incremental backups handle the case where > vacuum truncates the relation partially? Basically, with current > patch/design, it doesn't appear that such information can be passed > via incremental backup. I am not sure if this is a problem, but it > would be good if we can somehow handle this. As to this, if you're taking a full backup of a particular file, there's no problem. If you're taking a partial backup of a particular file, you need to include the current length of the file and the identity and contents of each modified block. Then you're fine. > Isn't some operations where at the end we directly call heap_sync > without writing WAL will have a similar problem as well? Maybe. Can you give an example? > Similarly, > it is not very clear if unlogged relations are handled in some way if > not, the same could be documented. I think that we don't need to back up the contents of unlogged relations at all, right? Restoration from an online backup always involves running recovery, and so unlogged relations will anyway get zapped. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Mon, Sep 16, 2019 at 7:22 AM Robert Haas wrote: > > On Thu, Sep 12, 2019 at 9:13 AM Jeevan Chalke > wrote: > > I had a look over this issue and observed that when the new database is > > created, the catalog files are copied as-is into the new directory > > corresponding to a newly created database. And as they are just copied, > > the LSN on those pages are not changed. Due to this incremental backup > > thinks that its an existing file and thus do not copy the blocks from > > these new files, leading to the failure. > > *facepalm* > > Well, this shoots a pretty big hole in my design for this feature. I > don't know why I didn't think of this when I wrote out that design > originally. Ugh. > > Unless we change the way that CREATE DATABASE and any similar > operations work so that they always stamp pages with new LSNs, I think > we have to give up on the idea of being able to take an incremental > backup by just specifying an LSN. > This seems to be a blocking problem for the LSN based design. Can we think of using creation time for file? Basically, if the file creation time is later than backup-labels "START TIME:", then include that file entirely. I think one big point against this is clock skew like what if somebody tinkers with the clock. And also, this can cover cases like what Jeevan has pointed but might not cover other cases which we found problematic. > We'll instead need to get a list of > files from the server first, and then request the entirety of any that > we don't have, plus the changed blocks from the ones that we do have. > I guess that will make Stephen happy, since it's more like the design > he wanted originally (and should generalize more simply to parallel > backup). > > One question I have is: is there any scenario in which an existing > page gets modified after the full backup and before the incremental > backup but does not end up with an LSN that follows the full backup's > start LSN? > I think the operations covered by WAL flag XLR_SPECIAL_REL_UPDATE will have similar problems. One related point is how do incremental backups handle the case where vacuum truncates the relation partially? Basically, with current patch/design, it doesn't appear that such information can be passed via incremental backup. I am not sure if this is a problem, but it would be good if we can somehow handle this. Isn't some operations where at the end we directly call heap_sync without writing WAL will have a similar problem as well? Similarly, it is not very clear if unlogged relations are handled in some way if not, the same could be documented. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: block-level incremental backup
On Thu, Sep 12, 2019 at 9:13 AM Jeevan Chalke wrote: > I had a look over this issue and observed that when the new database is > created, the catalog files are copied as-is into the new directory > corresponding to a newly created database. And as they are just copied, > the LSN on those pages are not changed. Due to this incremental backup > thinks that its an existing file and thus do not copy the blocks from > these new files, leading to the failure. *facepalm* Well, this shoots a pretty big hole in my design for this feature. I don't know why I didn't think of this when I wrote out that design originally. Ugh. Unless we change the way that CREATE DATABASE and any similar operations work so that they always stamp pages with new LSNs, I think we have to give up on the idea of being able to take an incremental backup by just specifying an LSN. We'll instead need to get a list of files from the server first, and then request the entirety of any that we don't have, plus the changed blocks from the ones that we do have. I guess that will make Stephen happy, since it's more like the design he wanted originally (and should generalize more simply to parallel backup). One question I have is: is there any scenario in which an existing page gets modified after the full backup and before the incremental backup but does not end up with an LSN that follows the full backup's start LSN? If there is, then the whole concept of using LSNs to tell which blocks have been modified doesn't really work. I can't think of a way that can happen off-hand, but then, I thought my last design was good, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Fri, Sep 13, 2019 at 1:08 PM vignesh C wrote: > Instead of reading the whole file here, we can read the file page by page. > There is a possibility of data inconsistency if data is not read page by > page, data will be consistent if read page by page as full page write will be > enabled at this time. I think you are confused about what "full page writes" means. It has to do what gets written to the write-ahead log, not the way that the pages themselves are written. There is no portable way to ensure that an 8kB read or write is atomic, and generally it isn't. It shouldn't matter whether the file is read all at once, page by page, or byte by byte, except for performance. Recovery is going to run when that backup is restored, and any inconsistencies should get fixed up at that time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Mon, Sep 9, 2019 at 4:51 PM Jeevan Chalke wrote: > > > > On Tue, Aug 27, 2019 at 4:46 PM vignesh C wrote: >> >> Few comments: >> Comment: >> + buf = (char *) malloc(statbuf->st_size); >> + if (buf == NULL) >> + ereport(ERROR, >> + (errcode(ERRCODE_OUT_OF_MEMORY), >> + errmsg("out of memory"))); >> + >> + if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0) >> + { >> + Bitmapset *mod_blocks = NULL; >> + int nmodblocks = 0; >> + >> + if (cnt % BLCKSZ != 0) >> + { >> >> We can use same size as full page size. >> After pg start backup full page write will be enabled. >> We can use the same file size to maintain data consistency. > > > Can you please explain which size? > The aim here is to read entire file in-memory and thus used statbuf->st_size. > Instead of reading the whole file here, we can read the file page by page. There is a possibility of data inconsistency if data is not read page by page, data will be consistent if read page by page as full page write will be enabled at this time. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: block-level incremental backup
Hi, One of my colleague at EDB, Rajkumar Raghuwanshi, while testing this feature reported an issue. He reported that if a full base-backup is taken, and then created a database, and then took an incremental backup, combining full backup with incremental backup is then failing. I had a look over this issue and observed that when the new database is created, the catalog files are copied as-is into the new directory corresponding to a newly created database. And as they are just copied, the LSN on those pages are not changed. Due to this incremental backup thinks that its an existing file and thus do not copy the blocks from these new files, leading to the failure. I have surprised to know that even though we are creating new files from old files, we kept the LSN unmodified. I didn't see any other parameter in basebackup which tells that this is a new file from last LSN or something. I tried looking for any other DDL doing similar stuff like creating a new page with existing LSN. But I could not find any other commands than CREATE DATABASE and ALTER DATABASE .. SET TABLESPACE. Suggestions/thoughts? -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Fri, Aug 30, 2019 at 6:52 PM Jeevan Ladhe wrote: > Here are some comments: > Or maybe we can just say: > "cannot verify checksum in file \"%s\"" if checksum requested, disable the > checksum and leave it to the following message: > > + ereport(WARNING, > + (errmsg("file size (%d) not in multiple of page size > (%d), sending whole file", > + (int) cnt, BLCKSZ))); > > Opted for the above suggestion. > > I think we should give the user hint from where he should be reading the > input > lsn for incremental backup in the --help option as well as documentation? > Something like - "To take an incremental backup, please provide value of > "--lsn" > as the "START WAL LOCATION" of previously taken full backup or incremental > backup from backup_lable file. > Added this in the documentation. In help, it will be too crowdy. > pg_combinebackup: > > +static bool made_new_outputdata = false; > +static bool found_existing_outputdata = false; > > Both of these are global, I understand that we need them global so that > they are > accessible in cleanup_directories_atexit(). But they are passed to > verify_dir_is_empty_or_create() as parameters, which I think is not needed. > Instead verify_dir_is_empty_or_create() can directly change the globals. > After adding support for a tablespace, these two functions take different values depending upon the context. > The current logic assumes the incremental backup directories are to be > provided > as input in the serial order the backups were taken. This is bit confusing > unless clarified in pg_combinebackup help menu or documentation. I think we > should clarify it at both the places. > Added in doc. > > I think scan_directory() should be rather renamed as do_combinebackup(). > I am not sure about this renaming. scan_directory() is called recursively to scan each sub-directories too. If we rename it then it is not actually recursively doing a combinebackup. Combine backup is a single whole process. -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Tue, Sep 3, 2019 at 12:11 PM Dilip Kumar wrote: > On Fri, Aug 16, 2019 at 3:54 PM Jeevan Chalke > wrote: > > > 0003: > +/* > + * When to send the whole file, % blocks modified (90%) > + */ > +#define WHOLE_FILE_THRESHOLD 0.9 > > How this threshold is selected. Is it by some test? > Currently, it is set arbitrarily. If required, we will make it a GUC. > > - magic number, currently 0 (4 bytes) > I think in the patch we are using (#define INCREMENTAL_BACKUP_MAGIC > 0x494E4352) as a magic number, not 0 > Yes. Robert too reported this. Updated the commit message. > > Can we breakdown this function in 2-3 functions. At least creating a > file map can directly go to a separate function. > Separated out filemap changes to separate function. Rest kept as is to have an easy followup. > > I have read 0003 and 0004 patch and there are few cosmetic comments. > Can you please post those too? Other comments are fixed. > > > -- > Regards, > Dilip Kumar > EnterpriseDB: http://www.enterprisedb.com > Thanks -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Wed, Sep 4, 2019 at 5:21 PM Dilip Kumar wrote: > > I have not yet completed the review for 0004, but I have few more > comments. Tomorrow I will try to complete the review and some testing > as well. > > 1. It seems that the output full backup generated with > pg_combinebackup also contains the "INCREMENTAL BACKUP REFERENCE WAL > LOCATION". It seems confusing > because now this is a full backup, not the incremental backup. > Yes, that was remaining and was in my TODO. Done in the new patchset. Also, taking --label as an input like pg_basebackup. > > 2. > + memset(outblocks, 0, sizeof(FileOffset) * RELSEG_SIZE); > > I don't think you need to memset this explicitly as you can initialize > the array itself no? > FileOffset outblocks[RELSEG_SIZE] = {{0}} > I didn't see any issue with memset either but changed this per your suggestion. > > -- > Regards, > Dilip Kumar > EnterpriseDB: http://www.enterprisedb.com > -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Tue, Aug 27, 2019 at 11:59 PM Robert Haas wrote: > On Fri, Aug 16, 2019 at 6:23 AM Jeevan Chalke > wrote: > > [ patches ] > > Reviewing 0002 and 0003: > > - Commit message for 0003 claims magic number and checksum are 0, but > that (fortunately) doesn't seem to be the case. > Oops, updated commit message. > > - looks_like_rel_name actually checks whether it looks like a > *non-temporary* relation name; suggest adjusting the function name. > > - The names do_full_backup and do_incremental_backup are quite > confusing because you're really talking about what to do with one > file. I suggest sendCompleteFile() and sendPartialFile(). > Changed function names. > > - Is there any good reason to have 'refptr' as a global variable, or > could we just pass the LSN around via function arguments? I know it's > just mimicking startptr, but storing startptr in a global variable > doesn't seem like a great idea either, so if it's not too annoying, > let's pass it down via function arguments instead. Also, refptr is a > crappy name (even worse than startptr); whether we end up with a > global variable or a bunch of local variables, let's make the name(s) > clear and unambiguous, like incremental_reference_lsn. Yeah, I know > that's long, but I still think it's better than being unclear. > Renamed variable. However, I have kept that as global only as it needs many functions to change their signature, like, sendFile(), sendDir(), sendTablspeace() etc. > - do_incremental_backup looks like it can never report an error from > fread(), which is bad. But I see that this is just copied from the > existing code which has the same problem, so I started a separate > thread about that. > > - I think that passing cnt and blkindex to verify_page_checksum() > doesn't look very good from an abstraction point of view. Granted, > the existing code isn't great either, but I think this makes the > problem worse. I suggest passing "int backup_distance" to this > function, computed as cnt - BLCKSZ * blkindex. Then, you can > fseek(-backup_distance), fread(BLCKSZ), and then fseek(backup_distance > - BLCKSZ). > Yep. Done these changes in the refactoring patch. > > - While I generally support the use of while and for loops rather than > goto for flow control, a while (1) loop that ends with a break is > functionally a goto anyway. I think there are several ways this could > be revised. The most obvious one is probably to use goto, but I vote > for inverting the sense of the test: if (PageIsNew(page) || > PageGetLSN(page) >= startptr) break; This approach also saves a level > of indentation for more than half of the function. > I have used this new inverted condition, but we still need a while(1) loop. > - I am not sure that it's a good idea for sendwholefile = true to > result in dumping the entire file onto the wire in a single CopyData > message. I don't know of a concrete problem in typical > configurations, but someone who increases RELSEG_SIZE might be able to > overflow CopyData's length word. At 2GB the length word would be > negative, which might break, and at 4GB it would wrap around, which > would certainly break. See CopyData in > https://www.postgresql.org/docs/12/protocol-message-formats.html To > avoid this issue, and maybe some others, I suggest defining a > reasonably large chunk size, say 1MB as a constant in this file > someplace, and sending the data as a series of chunks of that size. > OK. Done as per the suggestions. > > - I don't think that the way concurrent truncation is handled is > correct for partial files. Right now it just falls through to code > which appends blocks of zeroes in either the complete-file or > partial-file case. I think that logic should be moved into the > function that handles the complete-file case. In the partial-file > case, the blocks that we actually send need to match the list of block > numbers we promised to send. We can't just send the promised blocks > and then tack a bunch of zero-filled blocks onto the end that the file > header doesn't know about. > Well, in partial file case we won't end up inside that block. So we are never sending zeroes at the end in case of partial file. > - For reviewer convenience, please use the -v option to git > format-patch when posting and reposting a patch series. Using -v2, > -v3, etc. on successive versions really helps. > Sure. Thanks for letting me know about this option. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > Thanks -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Tue, Aug 27, 2019 at 4:46 PM vignesh C wrote: > Few comments: > Comment: > + buf = (char *) malloc(statbuf->st_size); > + if (buf == NULL) > + ereport(ERROR, > + (errcode(ERRCODE_OUT_OF_MEMORY), > + errmsg("out of memory"))); > + > + if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0) > + { > + Bitmapset *mod_blocks = NULL; > + int nmodblocks = 0; > + > + if (cnt % BLCKSZ != 0) > + { > > We can use same size as full page size. > After pg start backup full page write will be enabled. > We can use the same file size to maintain data consistency. > Can you please explain which size? The aim here is to read entire file in-memory and thus used statbuf->st_size. Comment: > Should we check if it is same timeline as the system's timeline. > At the time of taking the incremental backup, we can't check that. However, while combining, I made sure that the timeline is the same for all backups. > > Comment: > > Should we support compression formats supported by pg_basebackup. > This can be an enhancement after the functionality is completed. > For the incremental backup, it just works out of the box. For combining backup, as discussed up-thread, the user has to uncompress first, combine them, compress if required. > Comment: > We should provide some mechanism to validate the backup. To identify > if some backup is corrupt or some file is missing(deleted) in a > backup. > Maybe, but not for the first version. > Comment: > +/* > + * When to send the whole file, % blocks modified (90%) > + */ > +#define WHOLE_FILE_THRESHOLD 0.9 > + > This can be user configured value. > This can be an enhancement after the functionality is completed. > Yes. > Comment: > We can add a readme file with all the details regarding incremental > backup and combine backup. > Will have a look. > > Regards, > Vignesh > EnterpriseDB: http://www.enterprisedb.com > Thanks -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: block-level incremental backup
Hi, Attached new set of patches adding support for the tablespace handling. This patchset also fixes the issues reported by Vignesh, Robert, Jeevan Ladhe, and Dilip Kumar. Please have a look and let me know if I missed any comments to account. Thanks -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company From 7cc2fb70188676406ca883055467aff602438fcd Mon Sep 17 00:00:00 2001 From: Jeevan Chalke Date: Mon, 9 Sep 2019 10:38:01 +0530 Subject: [PATCH v2 1/4] Add support for command line option to pass LSN. This adds [ LSN 'lsn' ] to BASE_BACKUP command and --lsn=LSN to the pg_basebackup binary. Also, add small tests. --- src/backend/replication/basebackup.c | 20 src/backend/replication/repl_gram.y | 6 ++ src/backend/replication/repl_scanner.l | 1 + src/bin/pg_basebackup/pg_basebackup.c| 15 +-- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 12 +++- 5 files changed, 51 insertions(+), 3 deletions(-) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 6aab8d7..e72bf8e 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -38,6 +38,7 @@ #include "storage/ipc.h" #include "storage/reinit.h" #include "utils/builtins.h" +#include "utils/pg_lsn.h" #include "utils/ps_status.h" #include "utils/relcache.h" #include "utils/timestamp.h" @@ -52,6 +53,7 @@ typedef struct bool includewal; uint32 maxrate; bool sendtblspcmapfile; + XLogRecPtr lsn; } basebackup_options; @@ -652,6 +654,7 @@ parse_basebackup_options(List *options, basebackup_options *opt) bool o_maxrate = false; bool o_tablespace_map = false; bool o_noverify_checksums = false; + bool o_lsn = false; MemSet(opt, 0, sizeof(*opt)); foreach(lopt, options) @@ -740,6 +743,23 @@ parse_basebackup_options(List *options, basebackup_options *opt) noverify_checksums = true; o_noverify_checksums = true; } + else if (strcmp(defel->defname, "lsn") == 0) + { + bool have_error = false; + + if (o_lsn) +ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("duplicate option \"%s\"", defel->defname))); + o_lsn = true; + + /* Validate given LSN and convert it into XLogRecPtr. */ + opt->lsn = pg_lsn_in_internal(strVal(defel->arg), &have_error); + if (XLogRecPtrIsInvalid(opt->lsn)) +ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid value for LSN"))); + } else elog(ERROR, "option \"%s\" not recognized", defel->defname); diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y index c4e11cc..c24d319 100644 --- a/src/backend/replication/repl_gram.y +++ b/src/backend/replication/repl_gram.y @@ -87,6 +87,7 @@ static SQLCmd *make_sqlcmd(void); %token K_EXPORT_SNAPSHOT %token K_NOEXPORT_SNAPSHOT %token K_USE_SNAPSHOT +%token K_LSN %type command %type base_backup start_replication start_logical_replication @@ -214,6 +215,11 @@ base_backup_opt: $$ = makeDefElem("noverify_checksums", (Node *)makeInteger(true), -1); } + | K_LSN SCONST +{ + $$ = makeDefElem("lsn", + (Node *)makeString($2), -1); +} ; create_replication_slot: diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l index 380faeb..77b5af4 100644 --- a/src/backend/replication/repl_scanner.l +++ b/src/backend/replication/repl_scanner.l @@ -107,6 +107,7 @@ EXPORT_SNAPSHOT { return K_EXPORT_SNAPSHOT; } NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; } USE_SNAPSHOT { return K_USE_SNAPSHOT; } WAIT{ return K_WAIT; } +LSN { return K_LSN; } ","{ return ','; } ";"{ return ';'; } diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 7986872..1791853 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -105,6 +105,7 @@ static bool temp_replication_slot = true; static bool create_slot = false; static bool no_slot = false; static bool verify_checksums = true; +static char *lsn = NULL; static bool success = false; static bool made_new_pgdata = false; @@ -341,6 +342,7 @@ usage(void) " include required WAL files with specified method\n")); printf(_(" -z, --gzip compress tar output\n")); printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n")); + printf(_(" --lsn=LSN incremental backup, using LSN as threshold\n")); printf(_("\nGeneral options:\n")); printf(_(" -c, --checkpoint=fast|spread\n" " set fast or spread checkpointing\n")); @@ -1805,6 +1807,7 @@ BaseBackup(void) maxServerMajor; int serverVersion, serverMajor; + char *lsn_clause = NULL; Assert(conn != NULL); @@ -1871,8 +1874,11 @@
Re: block-level incremental backup
On Wed, Sep 4, 2019 at 10:08 PM Michael Paquier wrote: > > For generating a > > file, you can always emit the newest and "best" tar format, but for > > reading a file, you probably want to be prepared for older or cruftier > > variants. Maybe not -- I'm not super-familiar with the tar on-disk > > format. But I think there must be a reason why tar libraries exist, > > and I don't want to write a new one. > > We need to be sure as well that the library chosen does not block > access to a feature in all the various platforms we have. Well, again, my preference is to just not make this particular feature work natively with tar files. Then I don't need to choose a library, so the question is moot. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Tue, Sep 03, 2019 at 08:59:53AM -0400, Robert Haas wrote: > I think pg_basebackup is using homebrew code to generate tar files, > but I'm reluctant to do that for reading tar files. Yes. This code has not actually changed since its introduction. Please note that we also have code which reads directly data from a tarball in pg_basebackup.c when appending the recovery parameters to postgresql.auto.conf for -R. There could be some consolidation here with what you are doing. > For generating a > file, you can always emit the newest and "best" tar format, but for > reading a file, you probably want to be prepared for older or cruftier > variants. Maybe not -- I'm not super-familiar with the tar on-disk > format. But I think there must be a reason why tar libraries exist, > and I don't want to write a new one. We need to be sure as well that the library chosen does not block access to a feature in all the various platforms we have. -- Michael signature.asc Description: PGP signature
Re: block-level incremental backup
On Tue, Sep 3, 2019 at 12:46 PM Ibrar Ahmed wrote: > I did that and have experience working on the TAR format. I was curious > about what > "best/newest" you are talking. Well, why not go look it up? On my MacBook, tar is documented to understand three different tar formats: gnutar, ustar, and v7, and two sets of extensions to the tar format: numeric extensions required by POSIX, and Solaris extensions. It also understands the pax and restricted-pax formats which are derived from the ustar format. I don't know what your system supports, but it's probably not hugely different; the fact that there are multiple tar formats has been documented in the tar man page on every machine I've checked for the past 20 years. Here, 'man tar' refers the reader to 'man libarchive-formats', which contains the details mentioned above. A quick Google search for 'multiple tar formats' also finds https://en.wikipedia.org/wiki/Tar_(computing)#File_format and https://www.gnu.org/software/tar/manual/html_chapter/tar_8.html each of which explains a good deal of the complexity in this area. I don't really understand why I have to explain to you what I mean when I say there are multiple tar formats when you can look it up on Google and find that there are multiple tar formats. Again, the point is that the current code only generates tar archives and therefore only needs to generate one format, but if we add code that reads a tar archive, it probably needs to read several formats, because there are several formats that are popular enough to be widely-supported. It's possible that somebody else here knows more about this topic and could make better judgements than I can, but my view at present is that if we want to read tar archives, we probably would want to do it by depending on libarchive. And I don't think we should do that for this project because I don't think it would provide much value. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Tue, Sep 3, 2019 at 12:11 PM Dilip Kumar wrote: > > On Fri, Aug 16, 2019 at 3:54 PM Jeevan Chalke > wrote: > > > 0003: > +/* > + * When to send the whole file, % blocks modified (90%) > + */ > +#define WHOLE_FILE_THRESHOLD 0.9 > > How this threshold is selected. Is it by some test? > > > - magic number, currently 0 (4 bytes) > I think in the patch we are using (#define INCREMENTAL_BACKUP_MAGIC > 0x494E4352) as a magic number, not 0 > > > + Assert(statbuf->st_size <= (RELSEG_SIZE * BLCKSZ)); > + > + buf = (char *) malloc(statbuf->st_size); > + if (buf == NULL) > + ereport(ERROR, > + (errcode(ERRCODE_OUT_OF_MEMORY), > + errmsg("out of memory"))); > + > + if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0) > + { > + Bitmapset *mod_blocks = NULL; > + int nmodblocks = 0; > + > + if (cnt % BLCKSZ != 0) > + { > > It will be good to add some comments for the if block and also for the > assert. Actully, it's not very clear from the code. > > 0004: > +#include > +#include > +#include > Header file include order (sys/state.h should be before time.h) > > > > + printf(_("%s combines full backup with incremental backup.\n\n"), progname); > /backup/backups > > > + * scan_file > + * > + * Checks whether given file is partial file or not. If partial, then > combines > + * it into a full backup file, else copies as is to the output directory. > + */ > > /If partial, then combines/ If partial, then combine > > > > +static void > +combine_partial_files(const char *fn, char **IncrDirs, int nIncrDir, > + const char *subdirpath, const char *outfn) > + /* > + * Open all files from all incremental backup directories and create a file > + * map. > + */ > + basefilefound = false; > + for (i = (nIncrDir - 1), fmindex = 0; i >= 0; i--, fmindex++) > + { > + fm = &filemaps[fmindex]; > + > . > + } > + > + > + /* Process all opened files. */ > + lastblkno = 0; > + modifiedblockfound = false; > + for (i = 0; i < fmindex; i++) > + { > + char*buf; > + int hsize; > + int k; > + int blkstartoffset; > .. > + } > + > + for (i = 0; i <= lastblkno; i++) > + { > + char blkdata[BLCKSZ]; > + FILE*infp; > + int offset; > ... > + } > } > > Can we breakdown this function in 2-3 functions. At least creating a > file map can directly go to a separate function. > > I have read 0003 and 0004 patch and there are few cosmetic comments. > I have not yet completed the review for 0004, but I have few more comments. Tomorrow I will try to complete the review and some testing as well. 1. It seems that the output full backup generated with pg_combinebackup also contains the "INCREMENTAL BACKUP REFERENCE WAL LOCATION". It seems confusing because now this is a full backup, not the incremental backup. 2. + FILE*outfp; + FileOffset outblocks[RELSEG_SIZE]; + int i; + FileMap*filemaps; + int fmindex; + bool basefilefound; + bool modifiedblockfound; + uint32 lastblkno; + FileMap*fm; + struct stat statbuf; + uint32 nblocks; + + memset(outblocks, 0, sizeof(FileOffset) * RELSEG_SIZE); I don't think you need to memset this explicitly as you can initialize the array itself no? FileOffset outblocks[RELSEG_SIZE] = {{0}} -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: block-level incremental backup
On Tue, Sep 3, 2019 at 7:39 PM Robert Haas wrote: > On Tue, Sep 3, 2019 at 10:05 AM Ibrar Ahmed wrote: > > +1 using the library to tar. But I think reason not using tar library is > TAR is > > one of the most simple file format. What is the best/newest format of > TAR? > > So, I don't really want to go down this path at all, as I already > said. You can certainly do your own research on this topic if you > wish. > > I did that and have experience working on the TAR format. I was curious about what "best/newest" you are talking. > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Ibrar Ahmed
Re: block-level incremental backup
On Tue, Sep 3, 2019 at 8:00 PM Tom Lane wrote: > Ibrar Ahmed writes: > > +1 using the library to tar. > > Uh, *what* library? > I was just replying the Robert that he said "But I think there must be a reason why tar libraries exist, and I don't want to write a new one." I said I am ok to use a library "what he is proposing/thinking", but explained to him that TAR is the most simpler format that why PG has its own code. > pg_dump's pg_backup_tar.c is about 1300 lines, a very large fraction > of which is boilerplate for interfacing to pg_backup_archiver's APIs. > The stuff that actually knows specifically about tar looks to be maybe > a couple hundred lines, plus there's another couple hundred lines of > (rather duplicative?) code in src/port/tar.c. None of it is rocket > science. > > I can't believe that it'd be a good tradeoff to create a new external > dependency to replace that amount of code. In case you haven't noticed, > our luck with depending on external libraries has been abysmal. > > Possibly there's an argument for refactoring things so that there's > more stuff in tar.c and less elsewhere, but let's not go looking > for external code to depend on. > > regards, tom lane > -- Ibrar Ahmed
Re: block-level incremental backup
Ibrar Ahmed writes: > +1 using the library to tar. Uh, *what* library? pg_dump's pg_backup_tar.c is about 1300 lines, a very large fraction of which is boilerplate for interfacing to pg_backup_archiver's APIs. The stuff that actually knows specifically about tar looks to be maybe a couple hundred lines, plus there's another couple hundred lines of (rather duplicative?) code in src/port/tar.c. None of it is rocket science. I can't believe that it'd be a good tradeoff to create a new external dependency to replace that amount of code. In case you haven't noticed, our luck with depending on external libraries has been abysmal. Possibly there's an argument for refactoring things so that there's more stuff in tar.c and less elsewhere, but let's not go looking for external code to depend on. regards, tom lane
Re: block-level incremental backup
On Tue, Sep 3, 2019 at 10:05 AM Ibrar Ahmed wrote: > +1 using the library to tar. But I think reason not using tar library is TAR > is > one of the most simple file format. What is the best/newest format of TAR? So, I don't really want to go down this path at all, as I already said. You can certainly do your own research on this topic if you wish. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Tue, Sep 3, 2019 at 6:00 PM Robert Haas wrote: > On Sat, Aug 31, 2019 at 3:41 PM Ibrar Ahmed wrote: > > Are we using any tar library in pg_basebackup.c? We already have the > capability > > in pg_basebackup to do that. > > I think pg_basebackup is using homebrew code to generate tar files, > but I'm reluctant to do that for reading tar files. For generating a > file, you can always emit the newest and "best" tar format, but for > reading a file, you probably want to be prepared for older or cruftier > variants. Maybe not -- I'm not super-familiar with the tar on-disk > format. But I think there must be a reason why tar libraries exist, > and I don't want to write a new one. > +1 using the library to tar. But I think reason not using tar library is TAR is one of the most simple file format. What is the best/newest format of TAR? > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Ibrar Ahmed
Re: block-level incremental backup
On Sat, Aug 31, 2019 at 3:41 PM Ibrar Ahmed wrote: > Are we using any tar library in pg_basebackup.c? We already have the > capability > in pg_basebackup to do that. I think pg_basebackup is using homebrew code to generate tar files, but I'm reluctant to do that for reading tar files. For generating a file, you can always emit the newest and "best" tar format, but for reading a file, you probably want to be prepared for older or cruftier variants. Maybe not -- I'm not super-familiar with the tar on-disk format. But I think there must be a reason why tar libraries exist, and I don't want to write a new one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Fri, Aug 16, 2019 at 3:54 PM Jeevan Chalke wrote: > 0003: +/* + * When to send the whole file, % blocks modified (90%) + */ +#define WHOLE_FILE_THRESHOLD 0.9 How this threshold is selected. Is it by some test? - magic number, currently 0 (4 bytes) I think in the patch we are using (#define INCREMENTAL_BACKUP_MAGIC 0x494E4352) as a magic number, not 0 + Assert(statbuf->st_size <= (RELSEG_SIZE * BLCKSZ)); + + buf = (char *) malloc(statbuf->st_size); + if (buf == NULL) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of memory"))); + + if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0) + { + Bitmapset *mod_blocks = NULL; + int nmodblocks = 0; + + if (cnt % BLCKSZ != 0) + { It will be good to add some comments for the if block and also for the assert. Actully, it's not very clear from the code. 0004: +#include +#include +#include Header file include order (sys/state.h should be before time.h) + printf(_("%s combines full backup with incremental backup.\n\n"), progname); /backup/backups + * scan_file + * + * Checks whether given file is partial file or not. If partial, then combines + * it into a full backup file, else copies as is to the output directory. + */ /If partial, then combines/ If partial, then combine +static void +combine_partial_files(const char *fn, char **IncrDirs, int nIncrDir, + const char *subdirpath, const char *outfn) + /* + * Open all files from all incremental backup directories and create a file + * map. + */ + basefilefound = false; + for (i = (nIncrDir - 1), fmindex = 0; i >= 0; i--, fmindex++) + { + fm = &filemaps[fmindex]; + . + } + + + /* Process all opened files. */ + lastblkno = 0; + modifiedblockfound = false; + for (i = 0; i < fmindex; i++) + { + char*buf; + int hsize; + int k; + int blkstartoffset; .. + } + + for (i = 0; i <= lastblkno; i++) + { + char blkdata[BLCKSZ]; + FILE*infp; + int offset; ... + } } Can we breakdown this function in 2-3 functions. At least creating a file map can directly go to a separate function. I have read 0003 and 0004 patch and there are few cosmetic comments. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: block-level incremental backup
Hi Robert, On Sat, Aug 31, 2019 at 8:29 AM Robert Haas wrote: > On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe > wrote: > > Due to the inherent nature of pg_basebackup, the incremental backup also > > allows taking backup in tar and compressed format. But, pg_combinebackup > > does not understand how to restore this. I think we should either make > > pg_combinebackup support restoration of tar incremental backup or > restrict > > taking the incremental backup in tar format until pg_combinebackup > > supports the restoration by making option '--lsn' and '-Ft' exclusive. > > > > It is arguable that one can take the incremental backup in tar format, > extract > > that manually and then give the resultant directory as input to the > > pg_combinebackup, but I think that kills the purpose of having > > pg_combinebackup utility. > > I don't agree. You're right that you would have to untar (and > uncompress) the backup to run pg_combinebackup, but you would also > have to do that to restore a non-incremental backup, so it doesn't > seem much different. > Thanks. Yes I agree about the similarity between restoring non-incremental and incremental backup in this case. > I don't think it's worth doing that at this point; I definitely don't > think it > needs to be part of the first patch. > Makes sense. Regards, Jeevan Ladhe
Re: block-level incremental backup
On Sat, Aug 31, 2019 at 7:59 AM Robert Haas wrote: > On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe > wrote: > > Due to the inherent nature of pg_basebackup, the incremental backup also > > allows taking backup in tar and compressed format. But, pg_combinebackup > > does not understand how to restore this. I think we should either make > > pg_combinebackup support restoration of tar incremental backup or > restrict > > taking the incremental backup in tar format until pg_combinebackup > > supports the restoration by making option '--lsn' and '-Ft' exclusive. > > > > It is arguable that one can take the incremental backup in tar format, > extract > > that manually and then give the resultant directory as input to the > > pg_combinebackup, but I think that kills the purpose of having > > pg_combinebackup utility. > > I don't agree. You're right that you would have to untar (and > uncompress) the backup to run pg_combinebackup, but you would also > have to do that to restore a non-incremental backup, so it doesn't > seem much different. It's true that for an incremental backup you > might need to untar and uncompress multiple prior backups rather than > just one, but that's just the nature of an incremental backup. And, > on a practical level, if you want compression, which is pretty likely > if you're thinking about incremental backups, the way to get that is > to use tar format with -z or -Z. > > It might be interesting to teach pg_combinebackup to be able to read > tar-format backups, but I think that there are several variants of the > tar format, and I suspect it would need to read them all. If someone > un-tars and re-tars a backup with a different tar tool, we don't want > it to become unreadable. So we'd either have to write our own > de-tarring library or add an external dependency on one. Are we using any tar library in pg_basebackup.c? We already have the capability in pg_basebackup to do that. > I don't > think it's worth doing that at this point; I definitely don't think it > needs to be part of the first patch. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- Ibrar Ahmed
Re: block-level incremental backup
On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe wrote: > Due to the inherent nature of pg_basebackup, the incremental backup also > allows taking backup in tar and compressed format. But, pg_combinebackup > does not understand how to restore this. I think we should either make > pg_combinebackup support restoration of tar incremental backup or restrict > taking the incremental backup in tar format until pg_combinebackup > supports the restoration by making option '--lsn' and '-Ft' exclusive. > > It is arguable that one can take the incremental backup in tar format, extract > that manually and then give the resultant directory as input to the > pg_combinebackup, but I think that kills the purpose of having > pg_combinebackup utility. I don't agree. You're right that you would have to untar (and uncompress) the backup to run pg_combinebackup, but you would also have to do that to restore a non-incremental backup, so it doesn't seem much different. It's true that for an incremental backup you might need to untar and uncompress multiple prior backups rather than just one, but that's just the nature of an incremental backup. And, on a practical level, if you want compression, which is pretty likely if you're thinking about incremental backups, the way to get that is to use tar format with -z or -Z. It might be interesting to teach pg_combinebackup to be able to read tar-format backups, but I think that there are several variants of the tar format, and I suspect it would need to read them all. If someone un-tars and re-tars a backup with a different tar tool, we don't want it to become unreadable. So we'd either have to write our own de-tarring library or add an external dependency on one. I don't think it's worth doing that at this point; I definitely don't think it needs to be part of the first patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
Here are some comments: +/* The reference XLOG position for the incremental backup. */ +static XLogRecPtr refptr; As Robert already pointed we may want to pass this as parameter around instead of a global variable. Also, can be renamed to something like: incr_backup_refptr. I see in your earlier version of patch this was named startincrptr, which I think was more meaningful. - /* * If incremental backup, see whether the filename is a relation filename * or not. */ Can be reworded something like: "If incremental backup, check if it is relation file and can be sent partially." - + if (verify_checksum) + { + ereport(WARNING, + (errmsg("cannot verify checksum in file \"%s\", block " + "%d: read buffer size %d and page size %d " + "differ", + readfilename, blkno, (int) cnt, BLCKSZ))); + verify_checksum = false; + } For do_incremental_backup() it does not make sense to show the block number in warning as it is always going to be 0 when we throw this warning. Further, I think this can be rephrased as: "cannot verify checksum in file \"%s\", read file size %d is not multiple of page size %d". Or maybe we can just say: "cannot verify checksum in file \"%s\"" if checksum requested, disable the checksum and leave it to the following message: + ereport(WARNING, + (errmsg("file size (%d) not in multiple of page size (%d), sending whole file", + (int) cnt, BLCKSZ))); - If you agree on the above comment for blkno, then we can shift declaration of blkno inside the condition " if (!sendwholefile)" in do_incremental_backup(), or avoid it altogether, and just pass "i" as blkindex, as well as blkno to verify_page_checksum(). May be add a comment why they are same in case of incremental backup. - I think we should give the user hint from where he should be reading the input lsn for incremental backup in the --help option as well as documentation? Something like - "To take an incremental backup, please provide value of "--lsn" as the "START WAL LOCATION" of previously taken full backup or incremental backup from backup_lable file. - pg_combinebackup: +static bool made_new_outputdata = false; +static bool found_existing_outputdata = false; Both of these are global, I understand that we need them global so that they are accessible in cleanup_directories_atexit(). But they are passed to verify_dir_is_empty_or_create() as parameters, which I think is not needed. Instead verify_dir_is_empty_or_create() can directly change the globals. - I see that checksum_failure is never set and always remains as false. May be it is something that you wanted to set in combine_partial_files() when a the corrupted partial file is detected? - I think the logic for verifying the backup chain should be moved out from main() function to a separate function. - + /* + * Verify the backup chain. INCREMENTAL BACKUP REFERENCE WAL LOCATION of + * the incremental backup must match with the START WAL LOCATION of the + * previous backup, until we reach a full backup in which there is no + * INCREMENTAL BACKUP REFERENCE WAL LOCATION. + */ The current logic assumes the incremental backup directories are to be provided as input in the serial order the backups were taken. This is bit confusing unless clarified in pg_combinebackup help menu or documentation. I think we should clarify it at both the places. - I think scan_directory() should be rather renamed as do_combinebackup(). Regards, Jeevan Ladhe On Thu, Aug 29, 2019 at 8:11 PM Jeevan Ladhe wrote: > Due to the inherent nature of pg_basebackup, the incremental backup also > allows taking backup in tar and compressed format. But, pg_combinebackup > does not understand how to restore this. I think we should either make > pg_combinebackup support restoration of tar incremental backup or restrict > taking the incremental backup in tar format until pg_combinebackup > supports the restoration by making option '--lsn' and '-Ft' exclusive. > > It is arguable that one can take the incremental backup in tar format, > extract > that manually and then give the resultant directory as input to the > pg_combinebackup, but I think that kills the purpose of having > pg_combinebackup utility. > > Thoughts? > > Regards, > Jeevan Ladhe >
Re: block-level incremental backup
Hi, I am doing some testing on pg_basebackup and pg_combinebackup patches. I have also tried to create tap test for pg_combinebackup by taking reference from pg_basebackup tap cases. Attaching first draft test patch. I have done some testing with compression options, both -z and -Z level is working with incremental backup. A minor comment : It is mentioned in pg_combinebackup help that maximum 10 incremental backup can be given with -i option, but I found maximum 9 incremental backup directories can be given at a time. Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation On Thu, Aug 29, 2019 at 10:06 PM Jeevan Ladhe wrote: > Due to the inherent nature of pg_basebackup, the incremental backup also > allows taking backup in tar and compressed format. But, pg_combinebackup > does not understand how to restore this. I think we should either make > pg_combinebackup support restoration of tar incremental backup or restrict > taking the incremental backup in tar format until pg_combinebackup > supports the restoration by making option '--lsn' and '-Ft' exclusive. > > It is arguable that one can take the incremental backup in tar format, > extract > that manually and then give the resultant directory as input to the > pg_combinebackup, but I think that kills the purpose of having > pg_combinebackup utility. > > Thoughts? > > Regards, > Jeevan Ladhe > diff --git a/src/bin/pg_combinebackup/t/pg_combinebackup.pl b/src/bin/pg_combinebackup/t/pg_combinebackup.pl new file mode 100644 index 000..e0f834a --- /dev/null +++ b/src/bin/pg_combinebackup/t/pg_combinebackup.pl @@ -0,0 +1,79 @@ +use strict; +use warnings; +use Cwd; +use Config; +use File::Basename qw(basename dirname); +use File::Path qw(rmtree); +use PostgresNode; +use TestLib; +use Test::More tests => 23; + +program_help_ok('pg_combinebackup'); +program_version_ok('pg_combinebackup'); +program_options_handling_ok('pg_combinebackup'); + +my $tempdir = TestLib::tempdir; + +my $node = get_new_node('main'); + +# Initialize node +$node->init(); +my $pgdata = $node->data_dir; + +# Change wal related setting for pg_basebackup to run +open my $conf, '>>', "$pgdata/postgresql.conf"; +print $conf "max_replication_slots = 10\n"; +print $conf "max_wal_senders = 10\n"; +print $conf "wal_level = replica\n"; +close $conf; +$node->start; + +$node->command_fails(['pg_combinebackup'], + 'pg_combinebackup needs full and incremental directory specified'); + +# Create an unlogged table to test that forks other than init are not copied. +$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)'); + +my $baseUnloggedPath = $node->safe_psql('postgres', + q{select pg_relation_filepath('base_unlogged')}); + +# Make sure main and init forks exist +ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base'); +ok(-f "$pgdata/$baseUnloggedPath",'unlogged main fork in base'); + +# Run full base backup. +$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup"], + 'pg_basebackup runs for full backup'); +ok(-f "$tempdir/backup/PG_VERSION", 'full backup was created'); + +# Unlogged relation forks other than init should not be copied +ok(-f "$tempdir/backup/${baseUnloggedPath}_init", + 'unlogged init fork in backup'); +ok( !-f "$tempdir/backup/$baseUnloggedPath", + 'unlogged main fork not in backup'); + +# Get LSN of last backup to use for incremental backupslurp_file +my @extract_lsn = split (" ", scalar TestLib::slurp_file("$tempdir/backup/backup_label")); +my $LSN = $extract_lsn[3]; + +# Run incr base backup. +$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup1",'--lsn', "$LSN"], + 'pg_basebackup runs for incremental backup'); +ok(-f "$tempdir/backup1/PG_VERSION", 'incremental backup was created'); + +# Unlogged relation forks other than init should not be copied +ok(-f "$tempdir/backup1/${baseUnloggedPath}_init", + 'unlogged init fork in backup'); +ok( !-f "$tempdir/backup1/$baseUnloggedPath", + 'unlogged main fork not in backup'); + +# Run pg_combinebackup. +$node->command_ok([ 'pg_combinebackup', '-f', "$tempdir/backup", '-i', "$tempdir/backup1", '-o', "$tempdir/backup2"], + 'pg_combinebackup runs'); +ok(-f "$tempdir/backup2/PG_VERSION", 'combined backup was created'); + +# Unlogged relation forks other than init should not be copied +ok(-f "$tempdir/backup2/${baseUnloggedPath}_init", + 'unlogged init fork in backup'); +ok( !-f "$tempdir/backup2/$baseUnloggedPath", + 'unlogged main fork not in backup');
Re: block-level incremental backup
Due to the inherent nature of pg_basebackup, the incremental backup also allows taking backup in tar and compressed format. But, pg_combinebackup does not understand how to restore this. I think we should either make pg_combinebackup support restoration of tar incremental backup or restrict taking the incremental backup in tar format until pg_combinebackup supports the restoration by making option '--lsn' and '-Ft' exclusive. It is arguable that one can take the incremental backup in tar format, extract that manually and then give the resultant directory as input to the pg_combinebackup, but I think that kills the purpose of having pg_combinebackup utility. Thoughts? Regards, Jeevan Ladhe
Re: block-level incremental backup
On Fri, Aug 16, 2019 at 6:23 AM Jeevan Chalke wrote: > [ patches ] Reviewing 0002 and 0003: - Commit message for 0003 claims magic number and checksum are 0, but that (fortunately) doesn't seem to be the case. - looks_like_rel_name actually checks whether it looks like a *non-temporary* relation name; suggest adjusting the function name. - The names do_full_backup and do_incremental_backup are quite confusing because you're really talking about what to do with one file. I suggest sendCompleteFile() and sendPartialFile(). - Is there any good reason to have 'refptr' as a global variable, or could we just pass the LSN around via function arguments? I know it's just mimicking startptr, but storing startptr in a global variable doesn't seem like a great idea either, so if it's not too annoying, let's pass it down via function arguments instead. Also, refptr is a crappy name (even worse than startptr); whether we end up with a global variable or a bunch of local variables, let's make the name(s) clear and unambiguous, like incremental_reference_lsn. Yeah, I know that's long, but I still think it's better than being unclear. - do_incremental_backup looks like it can never report an error from fread(), which is bad. But I see that this is just copied from the existing code which has the same problem, so I started a separate thread about that. - I think that passing cnt and blkindex to verify_page_checksum() doesn't look very good from an abstraction point of view. Granted, the existing code isn't great either, but I think this makes the problem worse. I suggest passing "int backup_distance" to this function, computed as cnt - BLCKSZ * blkindex. Then, you can fseek(-backup_distance), fread(BLCKSZ), and then fseek(backup_distance - BLCKSZ). - While I generally support the use of while and for loops rather than goto for flow control, a while (1) loop that ends with a break is functionally a goto anyway. I think there are several ways this could be revised. The most obvious one is probably to use goto, but I vote for inverting the sense of the test: if (PageIsNew(page) || PageGetLSN(page) >= startptr) break; This approach also saves a level of indentation for more than half of the function. - I am not sure that it's a good idea for sendwholefile = true to result in dumping the entire file onto the wire in a single CopyData message. I don't know of a concrete problem in typical configurations, but someone who increases RELSEG_SIZE might be able to overflow CopyData's length word. At 2GB the length word would be negative, which might break, and at 4GB it would wrap around, which would certainly break. See CopyData in https://www.postgresql.org/docs/12/protocol-message-formats.html To avoid this issue, and maybe some others, I suggest defining a reasonably large chunk size, say 1MB as a constant in this file someplace, and sending the data as a series of chunks of that size. - I don't think that the way concurrent truncation is handled is correct for partial files. Right now it just falls through to code which appends blocks of zeroes in either the complete-file or partial-file case. I think that logic should be moved into the function that handles the complete-file case. In the partial-file case, the blocks that we actually send need to match the list of block numbers we promised to send. We can't just send the promised blocks and then tack a bunch of zero-filled blocks onto the end that the file header doesn't know about. - For reviewer convenience, please use the -v option to git format-patch when posting and reposting a patch series. Using -v2, -v3, etc. on successive versions really helps. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Fri, Aug 16, 2019 at 8:07 PM Ibrar Ahmed wrote: > > What do you mean by bigger file, a file greater than 1GB? In which case you > get file > 1GB? > > > Few comments: Comment: + buf = (char *) malloc(statbuf->st_size); + if (buf == NULL) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of memory"))); + + if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0) + { + Bitmapset *mod_blocks = NULL; + int nmodblocks = 0; + + if (cnt % BLCKSZ != 0) + { We can use same size as full page size. After pg start backup full page write will be enabled. We can use the same file size to maintain data consistency. Comment: /* Validate given LSN and convert it into XLogRecPtr. */ + opt->lsn = pg_lsn_in_internal(strVal(defel->arg), &have_error); + if (XLogRecPtrIsInvalid(opt->lsn)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid value for LSN"))); Validate input lsn is less than current system lsn. Comment: /* Validate given LSN and convert it into XLogRecPtr. */ + opt->lsn = pg_lsn_in_internal(strVal(defel->arg), &have_error); + if (XLogRecPtrIsInvalid(opt->lsn)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid value for LSN"))); Should we check if it is same timeline as the system's timeline. Comment: + if (fread(blkdata, 1, BLCKSZ, infp) != BLCKSZ) + { + pg_log_error("could not read from file \"%s\": %m", outfn); + cleanup_filemaps(filemaps, fmindex + 1); + exit(1); + } + + /* Finally write one block to the output file */ + if (fwrite(blkdata, 1, BLCKSZ, outfp) != BLCKSZ) + { + pg_log_error("could not write to file \"%s\": %m", outfn); + cleanup_filemaps(filemaps, fmindex + 1); + exit(1); + } Should we support compression formats supported by pg_basebackup. This can be an enhancement after the functionality is completed. Comment: We should provide some mechanism to validate the backup. To identify if some backup is corrupt or some file is missing(deleted) in a backup. Comment: + ofp = fopen(tofn, "wb"); + if (ofp == NULL) + { + pg_log_error("could not create file \"%s\": %m", tofn); + exit(1); + } ifp should be closed in the error flow. Comment: + fp = fopen(filename, "r"); + if (fp == NULL) + { + pg_log_error("could not read file \"%s\": %m", filename); + exit(1); + } + + labelfile = pg_malloc(statbuf.st_size + 1); + if (fread(labelfile, 1, statbuf.st_size, fp) != statbuf.st_size) + { + pg_log_error("corrupted file \"%s\": %m", filename); + pg_free(labelfile); + exit(1); + } fclose can be moved above. Comment: + if (!modifiedblockfound) + { + copy_whole_file(fm->filename, outfn); + cleanup_filemaps(filemaps, fmindex + 1); + return; + } + + /* Write all blocks to the output file */ + + if (fstat(fileno(fm->fp), &statbuf) != 0) + { + pg_log_error("could not stat file \"%s\": %m", fm->filename); + pg_free(filemaps); + exit(1); + } Some error flow, cleanup_filemaps need to be called to close the file descriptors that are opened. Comment: +/* + * When to send the whole file, % blocks modified (90%) + */ +#define WHOLE_FILE_THRESHOLD 0.9 + This can be user configured value. This can be an enhancement after the functionality is completed. Comment: We can add a readme file with all the details regarding incremental backup and combine backup. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: block-level incremental backup
On Fri, Aug 16, 2019 at 4:12 PM Ibrar Ahmed wrote: > > > > > On Fri, Aug 16, 2019 at 3:24 PM Jeevan Chalke < > jeevan.cha...@enterprisedb.com> wrote: > >> >> >> On Fri, Aug 2, 2019 at 6:43 PM vignesh C wrote: >> >>> Some comments: >>> 1) There will be some link files created for tablespace, we might >>> require some special handling for it >>> >> >> Yep. I have that in my ToDo. >> Will start working on that soon. >> >> >>> 2) >>> Retry functionality is hanlded only for copying of full files, should >>> we handle retry for copying of partial files >>> 3) >>> we can have some range for maxretries similar to sleeptime >>> >> >> I took help from pg_standby code related to maxentries and sleeptime. >> >> However, as we don't want to use system() call now, I have >> removed all this kludge and just used fread/fwrite as discussed. >> >> >>> 4) >>> Should we check for malloc failure >>> >> >> Used pg_malloc() instead. Same is also suggested by Ibrar. >> >> >>> >>> 5) Should we add display of progress as backup may take some time, >>> this can be added as enhancement. We can get other's opinion on this. >>> >> >> Can be done afterward once we have the functionality in place. >> >> >>> >>> 6) >>> If the backup count increases providing the input may be difficult, >>> Shall user provide all the incremental backups from a parent folder >>> and can we handle the ordering of incremental backup internally >>> >> >> I am not sure of this yet. We need to provide the tablespace mapping too. >> But thanks for putting a point here. Will keep that in mind when I >> revisit this. >> >> >>> >>> 7) >>> Add verbose for copying whole file >>> >> Done >> >> >>> >>> 8) We can also check if approximate space is available in disk before >>> starting combine backup, this can be added as enhancement. We can get >>> other's opinion on this. >>> >> >> Hmm... will leave it for now. User will get an error anyway. >> >> >>> >>> 9) >>> Combine backup into directory can be combine backup directory >>> >> Done >> >> >>> >>> 10) >>> MAX_INCR_BK_COUNT can be increased little, some applications use 1 >>> full backup at the beginning of the month and use 30 incremental >>> backups rest of the days in the month >>> >> >> Yeah, agree. But using any number here is debatable. >> Let's see others opinion too. >> > Why not use a list? > > >> >> >>> Regards, >>> Vignesh >>> EnterpriseDB: http://www.enterprisedb.com >>> >> >> >> Attached new sets of patches with refactoring done separately. >> Incremental backup patch became small now and hopefully more >> readable than the first version. >> >> -- >> Jeevan Chalke >> Technical Architect, Product Development >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> > > + buf = (char *) malloc(statbuf->st_size); > > + if (buf == NULL) > > + ereport(ERROR, > > + (errcode(ERRCODE_OUT_OF_MEMORY), > > +errmsg("out of memory"))); > > Why are you using malloc, you can use palloc here. > > > > Hi, I gave another look at the patch and have some quick comments. - > char *extptr = strstr(fn, ".partial"); I think there should be a better and strict way to check the file extension. - > + extptr = strstr(outfn, ".partial"); > + Assert (extptr != NULL); Why are you checking that again, you just appended that in the above statement? - > + if (verbose && statbuf.st_size > (RELSEG_SIZE * BLCKSZ)) > + pg_log_info("found big file \"%s\" (size: %.2lfGB): %m", fromfn, > + (double) statbuf.st_size / (RELSEG_SIZE * BLCKSZ)); This is not just a log, you find a file which is bigger which surely has some problem. - > +* We do read entire 1GB file in memory while taking incremental backup; so > +* I don't see any reason why can't we do that here. Also, copying data in > +* chunks is expensive. However, for bigger files, we still slice at 1GB > +* border. What do you mean by bigger file, a file greater than 1GB? In which case you get file > 1GB? -- Ibrar Ahmed
Re: block-level incremental backup
On Fri, Aug 16, 2019 at 3:24 PM Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote: > > > On Fri, Aug 2, 2019 at 6:43 PM vignesh C wrote: > >> Some comments: >> 1) There will be some link files created for tablespace, we might >> require some special handling for it >> > > Yep. I have that in my ToDo. > Will start working on that soon. > > >> 2) >> Retry functionality is hanlded only for copying of full files, should >> we handle retry for copying of partial files >> 3) >> we can have some range for maxretries similar to sleeptime >> > > I took help from pg_standby code related to maxentries and sleeptime. > > However, as we don't want to use system() call now, I have > removed all this kludge and just used fread/fwrite as discussed. > > >> 4) >> Should we check for malloc failure >> > > Used pg_malloc() instead. Same is also suggested by Ibrar. > > >> >> 5) Should we add display of progress as backup may take some time, >> this can be added as enhancement. We can get other's opinion on this. >> > > Can be done afterward once we have the functionality in place. > > >> >> 6) >> If the backup count increases providing the input may be difficult, >> Shall user provide all the incremental backups from a parent folder >> and can we handle the ordering of incremental backup internally >> > > I am not sure of this yet. We need to provide the tablespace mapping too. > But thanks for putting a point here. Will keep that in mind when I revisit > this. > > >> >> 7) >> Add verbose for copying whole file >> > Done > > >> >> 8) We can also check if approximate space is available in disk before >> starting combine backup, this can be added as enhancement. We can get >> other's opinion on this. >> > > Hmm... will leave it for now. User will get an error anyway. > > >> >> 9) >> Combine backup into directory can be combine backup directory >> > Done > > >> >> 10) >> MAX_INCR_BK_COUNT can be increased little, some applications use 1 >> full backup at the beginning of the month and use 30 incremental >> backups rest of the days in the month >> > > Yeah, agree. But using any number here is debatable. > Let's see others opinion too. > Why not use a list? > > >> Regards, >> Vignesh >> EnterpriseDB: http://www.enterprisedb.com >> > > > Attached new sets of patches with refactoring done separately. > Incremental backup patch became small now and hopefully more > readable than the first version. > > -- > Jeevan Chalke > Technical Architect, Product Development > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > + buf = (char *) malloc(statbuf->st_size); + if (buf == NULL) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), +errmsg("out of memory"))); Why are you using malloc, you can use palloc here. -- Ibrar Ahmed
Re: block-level incremental backup
On Fri, Aug 9, 2019 at 6:07 AM Jeevan Ladhe wrote: > Hi Jeevan, > > I have reviewed the backup part at code level and still looking into the > restore(combine) and functional part of it. But, here are my comments so > far: > Thank you Jeevan Ladhe for reviewing the changes. > > The patches need rebase. > Done. > May be we should rename to something like: > "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP > START LOCATION" > to make it more intuitive? > As discussed, used "INCREMENTAL BACKUP REFERENCE WAL LOCATION". File header structure is defined in both the files basebackup.c and > pg_combinebackup.c. I think it is better to move this to > replication/basebackup.h. > Yep. Was that in my cleanup list. Done now. > I think we can avoid having flag isrelfile in sendFile(). > Something like this: > Also, having isrelfile as part of following condition: > is confusing, because even the relation files in full backup are going to > be > backed up by this loop only, but still, the condition reads '(!isrelfile > &&...)'. > In the refactored patch I have moved full backup code in a separate function. And now all incremental backup code is also done in its own function. Hopefully, the code is now more readable. > > IMHO, while labels are not advisable in general, it may be better to use a > label > here rather than a while(1) loop, so that we can move to the label in case > we > want to retry once. I think here it opens doors for future bugs if someone > happens to add code here, ending up adding some condition and then the > break becomes conditional. That will leave us in an infinite loop. > I kept it as is as I don't see any correctness issue here. Similar to structure partial_file_header, I think above macro can also be > moved > to basebackup.h instead of defining it twice. > Yes. Done. > I think this is a huge memory request (1GB) and may fail on busy/loaded > server at > times. We should check for failures of malloc, maybe throw some error on > getting ENOMEM as errno. > Agree. Done. > Here, should not we expect statbuf->st_size < (RELSEG_SIZE * BLCKSZ), and > it > should be safe to read just statbuf_st_size always I guess? But, I am ok > with > having this extra guard here. > Yes, we can do this way. Added an Assert() before that and used just statbuf->st_size. In sendFile(), I am sorry if I am missing something, but I am not able to > understand why 'cnt' and 'i' should have different values when they are > being > passed to verify_page_checksum(). I think passing only one of them should > be > sufficient. > As discussed offline, you meant to say i and blkno. These two are different. i represent the current block offset from the read buffer whereas blkno is the offset from the start of the page. For incremental backup, they are same as we read the whole file but they are different in case of regular full backup where we read 4 blocks at a time. i value there will be between 0 and 3. > Maybe we should just have a variable no_of_blocks to store a number of > blocks, > rather than calculating this say RELSEG_SIZE(i.e. 131072) times in the > worst > case. > OK. Done. > Sorry if I am missing something, but, should not it be just: > > len = cnt; > Yeah. Done. > As I said earlier in my previous email, we now do not need > +decode_lsn_internal() > as it is already taken care by the introduction of function > pg_lsn_in_internal(). > Yes. Done that and rebased on latest HEAD. > > Regards, > Jeevan Ladhe > Patches attached in the previous reply. -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Fri, Aug 2, 2019 at 6:43 PM vignesh C wrote: > Some comments: > 1) There will be some link files created for tablespace, we might > require some special handling for it > Yep. I have that in my ToDo. Will start working on that soon. > 2) > Retry functionality is hanlded only for copying of full files, should > we handle retry for copying of partial files > 3) > we can have some range for maxretries similar to sleeptime > I took help from pg_standby code related to maxentries and sleeptime. However, as we don't want to use system() call now, I have removed all this kludge and just used fread/fwrite as discussed. > 4) > Should we check for malloc failure > Used pg_malloc() instead. Same is also suggested by Ibrar. > > 5) Should we add display of progress as backup may take some time, > this can be added as enhancement. We can get other's opinion on this. > Can be done afterward once we have the functionality in place. > > 6) > If the backup count increases providing the input may be difficult, > Shall user provide all the incremental backups from a parent folder > and can we handle the ordering of incremental backup internally > I am not sure of this yet. We need to provide the tablespace mapping too. But thanks for putting a point here. Will keep that in mind when I revisit this. > > 7) > Add verbose for copying whole file > Done > > 8) We can also check if approximate space is available in disk before > starting combine backup, this can be added as enhancement. We can get > other's opinion on this. > Hmm... will leave it for now. User will get an error anyway. > > 9) > Combine backup into directory can be combine backup directory > Done > > 10) > MAX_INCR_BK_COUNT can be increased little, some applications use 1 > full backup at the beginning of the month and use 30 incremental > backups rest of the days in the month > Yeah, agree. But using any number here is debatable. Let's see others opinion too. > Regards, > Vignesh > EnterpriseDB: http://www.enterprisedb.com > Attached new sets of patches with refactoring done separately. Incremental backup patch became small now and hopefully more readable than the first version. -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company From 93041c12a7d07bf17073c9cf4571bd3b5a8acc81 Mon Sep 17 00:00:00 2001 From: Jeevan Chalke Date: Fri, 16 Aug 2019 14:08:34 +0530 Subject: [PATCH 1/4] Add support for command line option to pass LSN. This adds [ LSN 'lsn' ] to BASE_BACKUP command and --lsn=LSN to the pg_basebackup binary. Also, add small tests. --- src/backend/replication/basebackup.c | 20 src/backend/replication/repl_gram.y | 6 ++ src/backend/replication/repl_scanner.l | 1 + src/bin/pg_basebackup/pg_basebackup.c| 15 +-- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 12 +++- 5 files changed, 51 insertions(+), 3 deletions(-) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index c91f66d..74c954b 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -38,6 +38,7 @@ #include "storage/ipc.h" #include "storage/reinit.h" #include "utils/builtins.h" +#include "utils/pg_lsn.h" #include "utils/ps_status.h" #include "utils/relcache.h" #include "utils/timestamp.h" @@ -52,6 +53,7 @@ typedef struct bool includewal; uint32 maxrate; bool sendtblspcmapfile; + XLogRecPtr lsn; } basebackup_options; @@ -638,6 +640,7 @@ parse_basebackup_options(List *options, basebackup_options *opt) bool o_maxrate = false; bool o_tablespace_map = false; bool o_noverify_checksums = false; + bool o_lsn = false; MemSet(opt, 0, sizeof(*opt)); foreach(lopt, options) @@ -726,6 +729,23 @@ parse_basebackup_options(List *options, basebackup_options *opt) noverify_checksums = true; o_noverify_checksums = true; } + else if (strcmp(defel->defname, "lsn") == 0) + { + bool have_error = false; + + if (o_lsn) +ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("duplicate option \"%s\"", defel->defname))); + o_lsn = true; + + /* Validate given LSN and convert it into XLogRecPtr. */ + opt->lsn = pg_lsn_in_internal(strVal(defel->arg), &have_error); + if (XLogRecPtrIsInvalid(opt->lsn)) +ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid value for LSN"))); + } else elog(ERROR, "option \"%s\" not recognized", defel->defname); diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y index c4e11cc..c24d319 100644 --- a/src/backend/replication/repl_gram.y +++ b/src/backend/replication/repl_gram.y @@ -87,6 +87,7 @@ static SQLCmd *make_sqlcmd(void); %token K_EXPORT_SNAPSHOT %token K_NOEXPORT_SNAPSHOT %token K_USE_SNAPSHOT +%token K_LSN %type command %type base_backup start_replication start_
Re: block-level incremental backup
On Mon, Aug 12, 2019 at 4:57 PM Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote: > > > On Fri, Aug 9, 2019 at 6:36 PM Robert Haas wrote: > >> On Wed, Aug 7, 2019 at 5:46 AM Jeevan Chalke >> wrote: >> > So, do you mean we should just do fread() and fwrite() for the whole >> file? >> > >> > I thought it is better if it was done by the OS itself instead of >> reading 1GB >> > into the memory and writing the same to the file. >> >> Well, 'cp' is just a C program. If they can write code to copy a >> file, so can we, and then we're not dependent on 'cp' being installed, >> working properly, being in the user's path or at the hard-coded >> pathname we expect, etc. There's an existing copy_file() function in >> src/backed/storage/file/copydir.c which I'd probably look into >> adapting for frontend use. I'm not sure whether it would be important >> to adapt the data-flushing code that's present in that routine or >> whether we could get by with just the loop to read() and write() data. >> > > Agree that we can certainly use open(), read(), write(), and close() here, > but > given that pg_basebackup.c and basbackup.c are using file operations, I > think > using fopen(), fread(), fwrite(), and fclose() will be better here, > at-least > for consistetncy. > +1 for using fopen(), fread(), fwrite(), and fclose() > Let me know if we still want to go with native OS calls. > > -1 for OS call > >> -- >> Robert Haas >> EnterpriseDB: http://www.enterprisedb.com >> The Enterprise PostgreSQL Company >> > > > -- > Jeevan Chalke > Technical Architect, Product Development > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > -- Ibrar Ahmed
Re: block-level incremental backup
On Mon, Aug 12, 2019 at 7:57 AM Jeevan Chalke wrote: > Agree that we can certainly use open(), read(), write(), and close() here, but > given that pg_basebackup.c and basbackup.c are using file operations, I think > using fopen(), fread(), fwrite(), and fclose() will be better here, at-least > for consistetncy. Oh, that's fine. Whatever's more consistent with the pre-existing code. Just, let's not use system(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Mon, Aug 12, 2019 at 5:29 PM Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote: > > > On Fri, Aug 9, 2019 at 11:56 PM Jeevan Ladhe < > jeevan.la...@enterprisedb.com> wrote: > >> Hi Robert, >> >> On Fri, Aug 9, 2019 at 6:40 PM Robert Haas wrote: >> >>> On Thu, Aug 8, 2019 at 8:37 PM Jeevan Ladhe >>> wrote: >>> > + if (!XLogRecPtrIsInvalid(previous_lsn)) >>> > + appendStringInfo(labelfile, "PREVIOUS WAL LOCATION: >>> %X/%X\n", >>> > +(uint32) (previous_lsn >> 32), (uint32) >>> previous_lsn); >>> > >>> > May be we should rename to something like: >>> > "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP >>> START LOCATION" >>> > to make it more intuitive? >>> >>> So, I think that you are right that PREVIOUS WAL LOCATION might not be >>> entirely clear, but at least in my view, INCREMENTAL BACKUP START WAL >>> LOCATION is definitely not clear. This backup is an incremental >>> backup, and it has a start WAL location, so you'd end up with START >>> WAL LOCATION and INCREMENTAL BACKUP START WAL LOCATION and those sound >>> like they ought to both be the same thing, but they're not. Perhaps >>> something like REFERENCE WAL LOCATION or REFERENCE WAL LOCATION FOR >>> INCREMENTAL BACKUP would be clearer. >>> >> >> Agree, how about INCREMENTAL BACKUP REFERENCE WAL LOCATION ? >> > > +1 for INCREMENTAL BACKUP REFERENCE WA. > Sorry for the typo: +1 for the INCREMENTAL BACKUP REFERENCE WAL LOCATION. > >> > > -- > Jeevan Chalke > Technical Architect, Product Development > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Fri, Aug 9, 2019 at 11:56 PM Jeevan Ladhe wrote: > Hi Robert, > > On Fri, Aug 9, 2019 at 6:40 PM Robert Haas wrote: > >> On Thu, Aug 8, 2019 at 8:37 PM Jeevan Ladhe >> wrote: >> > + if (!XLogRecPtrIsInvalid(previous_lsn)) >> > + appendStringInfo(labelfile, "PREVIOUS WAL LOCATION: >> %X/%X\n", >> > +(uint32) (previous_lsn >> 32), (uint32) >> previous_lsn); >> > >> > May be we should rename to something like: >> > "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP >> START LOCATION" >> > to make it more intuitive? >> >> So, I think that you are right that PREVIOUS WAL LOCATION might not be >> entirely clear, but at least in my view, INCREMENTAL BACKUP START WAL >> LOCATION is definitely not clear. This backup is an incremental >> backup, and it has a start WAL location, so you'd end up with START >> WAL LOCATION and INCREMENTAL BACKUP START WAL LOCATION and those sound >> like they ought to both be the same thing, but they're not. Perhaps >> something like REFERENCE WAL LOCATION or REFERENCE WAL LOCATION FOR >> INCREMENTAL BACKUP would be clearer. >> > > Agree, how about INCREMENTAL BACKUP REFERENCE WAL LOCATION ? > +1 for INCREMENTAL BACKUP REFERENCE WA. > -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Fri, Aug 9, 2019 at 6:36 PM Robert Haas wrote: > On Wed, Aug 7, 2019 at 5:46 AM Jeevan Chalke > wrote: > > So, do you mean we should just do fread() and fwrite() for the whole > file? > > > > I thought it is better if it was done by the OS itself instead of > reading 1GB > > into the memory and writing the same to the file. > > Well, 'cp' is just a C program. If they can write code to copy a > file, so can we, and then we're not dependent on 'cp' being installed, > working properly, being in the user's path or at the hard-coded > pathname we expect, etc. There's an existing copy_file() function in > src/backed/storage/file/copydir.c which I'd probably look into > adapting for frontend use. I'm not sure whether it would be important > to adapt the data-flushing code that's present in that routine or > whether we could get by with just the loop to read() and write() data. > Agree that we can certainly use open(), read(), write(), and close() here, but given that pg_basebackup.c and basbackup.c are using file operations, I think using fopen(), fread(), fwrite(), and fclose() will be better here, at-least for consistetncy. Let me know if we still want to go with native OS calls. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: block-level incremental backup
Hi Robert, On Fri, Aug 9, 2019 at 6:40 PM Robert Haas wrote: > On Thu, Aug 8, 2019 at 8:37 PM Jeevan Ladhe > wrote: > > + if (!XLogRecPtrIsInvalid(previous_lsn)) > > + appendStringInfo(labelfile, "PREVIOUS WAL LOCATION: %X/%X\n", > > +(uint32) (previous_lsn >> 32), (uint32) > previous_lsn); > > > > May be we should rename to something like: > > "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP > START LOCATION" > > to make it more intuitive? > > So, I think that you are right that PREVIOUS WAL LOCATION might not be > entirely clear, but at least in my view, INCREMENTAL BACKUP START WAL > LOCATION is definitely not clear. This backup is an incremental > backup, and it has a start WAL location, so you'd end up with START > WAL LOCATION and INCREMENTAL BACKUP START WAL LOCATION and those sound > like they ought to both be the same thing, but they're not. Perhaps > something like REFERENCE WAL LOCATION or REFERENCE WAL LOCATION FOR > INCREMENTAL BACKUP would be clearer. > Agree, how about INCREMENTAL BACKUP REFERENCE WAL LOCATION ? > > File header structure is defined in both the files basebackup.c and > > pg_combinebackup.c. I think it is better to move this to > replication/basebackup.h. > > Or some other header, but yeah, definitely don't duplicate the struct > definition (or any other kind of definition). > Thanks. > > IMHO, while labels are not advisable in general, it may be better to use > a label > > here rather than a while(1) loop, so that we can move to the label in > case we > > want to retry once. I think here it opens doors for future bugs if > someone > > happens to add code here, ending up adding some condition and then the > > break becomes conditional. That will leave us in an infinite loop. > > I'm not sure which style is better here, but I don't really buy this > argument. No issues. I am ok either way. Regards, Jeevan Ladhe
Re: block-level incremental backup
On Thu, Aug 8, 2019 at 8:37 PM Jeevan Ladhe wrote: > + if (!XLogRecPtrIsInvalid(previous_lsn)) > + appendStringInfo(labelfile, "PREVIOUS WAL LOCATION: %X/%X\n", > +(uint32) (previous_lsn >> 32), (uint32) > previous_lsn); > > May be we should rename to something like: > "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP START > LOCATION" > to make it more intuitive? So, I think that you are right that PREVIOUS WAL LOCATION might not be entirely clear, but at least in my view, INCREMENTAL BACKUP START WAL LOCATION is definitely not clear. This backup is an incremental backup, and it has a start WAL location, so you'd end up with START WAL LOCATION and INCREMENTAL BACKUP START WAL LOCATION and those sound like they ought to both be the same thing, but they're not. Perhaps something like REFERENCE WAL LOCATION or REFERENCE WAL LOCATION FOR INCREMENTAL BACKUP would be clearer. > File header structure is defined in both the files basebackup.c and > pg_combinebackup.c. I think it is better to move this to > replication/basebackup.h. Or some other header, but yeah, definitely don't duplicate the struct definition (or any other kind of definition). > IMHO, while labels are not advisable in general, it may be better to use a > label > here rather than a while(1) loop, so that we can move to the label in case we > want to retry once. I think here it opens doors for future bugs if someone > happens to add code here, ending up adding some condition and then the > break becomes conditional. That will leave us in an infinite loop. I'm not sure which style is better here, but I don't really buy this argument. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Wed, Aug 7, 2019 at 5:46 AM Jeevan Chalke wrote: > So, do you mean we should just do fread() and fwrite() for the whole file? > > I thought it is better if it was done by the OS itself instead of reading 1GB > into the memory and writing the same to the file. Well, 'cp' is just a C program. If they can write code to copy a file, so can we, and then we're not dependent on 'cp' being installed, working properly, being in the user's path or at the hard-coded pathname we expect, etc. There's an existing copy_file() function in src/backed/storage/file/copydir.c which I'd probably look into adapting for frontend use. I'm not sure whether it would be important to adapt the data-flushing code that's present in that routine or whether we could get by with just the loop to read() and write() data. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
Hi Jeevan, I have reviewed the backup part at code level and still looking into the restore(combine) and functional part of it. But, here are my comments so far: The patches need rebase. + if (!XLogRecPtrIsInvalid(previous_lsn)) + appendStringInfo(labelfile, "PREVIOUS WAL LOCATION: %X/%X\n", +(uint32) (previous_lsn >> 32), (uint32) previous_lsn); May be we should rename to something like: "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP START LOCATION" to make it more intuitive? +typedef struct +{ + uint32 magic; + pg_crc32c checksum; + uint32 nblocks; + uint32 blocknumbers[FLEXIBLE_ARRAY_MEMBER]; +} partial_file_header; File header structure is defined in both the files basebackup.c and pg_combinebackup.c. I think it is better to move this to replication/basebackup.h. + boolisrelfile = false; I think we can avoid having flag isrelfile in sendFile(). Something like this: if (startincrptr && OidIsValid(dboid) && looks_like_rel_name(filename)) { //include the code here that is under "if (isrelfile)" block. } else { _tarWriteHeader(tarfilename, NULL, statbuf, false); while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp)) > 0) { ... } } Also, having isrelfile as part of following condition: {code} + while (!isrelfile && + (cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp)) > 0) {code} is confusing, because even the relation files in full backup are going to be backed up by this loop only, but still, the condition reads '(!isrelfile &&...)'. verify_page_checksum() { while(1) { break; } } IMHO, while labels are not advisable in general, it may be better to use a label here rather than a while(1) loop, so that we can move to the label in case we want to retry once. I think here it opens doors for future bugs if someone happens to add code here, ending up adding some condition and then the break becomes conditional. That will leave us in an infinite loop. +/* magic number in incremental backup's .partial file */ +#define INCREMENTAL_BACKUP_MAGIC 0x494E4352 Similar to structure partial_file_header, I think above macro can also be moved to basebackup.h instead of defining it twice. In sendFile(): + buf = (char *) malloc(RELSEG_SIZE * BLCKSZ); I think this is a huge memory request (1GB) and may fail on busy/loaded server at times. We should check for failures of malloc, maybe throw some error on getting ENOMEM as errno. + /* Perform incremenatl backup stuff here. */ + if ((cnt = fread(buf, 1, Min(RELSEG_SIZE * BLCKSZ, statbuf->st_size), fp)) > 0) + { Here, should not we expect statbuf->st_size < (RELSEG_SIZE * BLCKSZ), and it should be safe to read just statbuf_st_size always I guess? But, I am ok with having this extra guard here. In sendFile(), I am sorry if I am missing something, but I am not able to understand why 'cnt' and 'i' should have different values when they are being passed to verify_page_checksum(). I think passing only one of them should be sufficient. + XLogRecPtr pglsn; + + for (i = 0; i < cnt / BLCKSZ; i++) + { Maybe we should just have a variable no_of_blocks to store a number of blocks, rather than calculating this say RELSEG_SIZE(i.e. 131072) times in the worst case. + len += cnt; + throttle(cnt); + } Sorry if I am missing something, but, should not it be just: len = cnt; As I said earlier in my previous email, we now do not need +decode_lsn_internal() as it is already taken care by the introduction of function pg_lsn_in_internal(). Regards, Jeevan Ladhe
Re: block-level incremental backup
On Wed, Aug 7, 2019 at 2:47 PM Jeevan Chalke wrote: > > > On Mon, Aug 5, 2019 at 7:13 PM Robert Haas wrote: > >> On Fri, Aug 2, 2019 at 9:13 AM vignesh C wrote: >> > + rc = system(copycmd); >> >> I don't think this patch should be calling system() in the first place. >> > > So, do you mean we should just do fread() and fwrite() for the whole file? > > I thought it is better if it was done by the OS itself instead of reading > 1GB > into the memory and writing the same to the file. > > It is not necessary to read the whole 1GB into Ram. > >> -- >> Robert Haas >> EnterpriseDB: http://www.enterprisedb.com >> The Enterprise PostgreSQL Company >> > > > -- > Jeevan Chalke > Technical Architect, Product Development > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > -- Ibrar Ahmed
Re: block-level incremental backup
On Mon, Aug 5, 2019 at 7:13 PM Robert Haas wrote: > On Fri, Aug 2, 2019 at 9:13 AM vignesh C wrote: > > + rc = system(copycmd); > > I don't think this patch should be calling system() in the first place. > So, do you mean we should just do fread() and fwrite() for the whole file? I thought it is better if it was done by the OS itself instead of reading 1GB into the memory and writing the same to the file. > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Tue, Aug 6, 2019 at 11:31 PM Ibrar Ahmed wrote: > > I have not looked at the patch in detail, but just some nits from my side. > > On Fri, Aug 2, 2019 at 6:13 PM vignesh C wrote: > >> On Thu, Aug 1, 2019 at 5:06 PM Jeevan Chalke >> wrote: >> > >> > On Tue, Jul 30, 2019 at 9:39 AM Jeevan Chalke < >> jeevan.cha...@enterprisedb.com> wrote: >> >> >> >> I am almost done writing the patch for pg_combinebackup and will post >> soon. >> > >> > >> > Attached patch which implements the pg_combinebackup utility used to >> combine >> > full basebackup with one or more incremental backups. >> > >> > I have tested it manually and it works for all best cases. >> > >> > Let me know if you have any inputs/suggestions/review comments? >> > >> Some comments: >> 1) There will be some link files created for tablespace, we might >> require some special handling for it >> >> 2) >> + while (numretries <= maxretries) >> + { >> + rc = system(copycmd); >> + if (rc == 0) >> + return; >> >> Use API to copy the file instead of "system", better to use the secure > copy. > Ah, it is a local copy, simple copy API is enough. > > >> + pg_log_info("could not copy, retrying after %d seconds", >> + sleeptime); >> + pg_usleep(numretries++ * sleeptime * 100L); >> + } >> Retry functionality is hanlded only for copying of full files, should >> we handle retry for copying of partial files >> >> The log and the sleep time does not match, you are multiplying sleeptime > with numretries++ and logging only "sleeptime" > > Why we are retiring here, capture proper copy error and act accordingly. > Blindly retiring does not make sense. > > 3) >> + maxretries = atoi(optarg); >> + if (maxretries < 0) >> + { >> + pg_log_error("invalid value for maxretries"); >> + fprintf(stderr, _("%s: -r maxretries must be >= 0\n"), progname); >> + exit(1); >> + } >> + break; >> + case 's': >> + sleeptime = atoi(optarg); >> + if (sleeptime <= 0 || sleeptime > 60) >> + { >> + pg_log_error("invalid value for sleeptime"); >> + fprintf(stderr, _("%s: -s sleeptime must be between 1 and 60\n"), >> progname); >> + exit(1); >> + } >> + break; >> we can have some range for maxretries similar to sleeptime >> >> 4) >> + fp = fopen(filename, "r"); >> + if (fp == NULL) >> + { >> + pg_log_error("could not read file \"%s\": %m", filename); >> + exit(1); >> + } >> + >> + labelfile = malloc(statbuf.st_size + 1); >> + if (fread(labelfile, 1, statbuf.st_size, fp) != statbuf.st_size) >> + { >> + pg_log_error("corrupted file \"%s\": %m", filename); >> + free(labelfile); >> + exit(1); >> + } >> Should we check for malloc failure >> >> Use pg_malloc instead of malloc > > >> 5) Should we add display of progress as backup may take some time, >> this can be added as enhancement. We can get other's opinion on this. >> >> Yes, we should, but this is not the right time to do that. > > >> 6) >> + if (nIncrDir == MAX_INCR_BK_COUNT) >> + { >> + pg_log_error("too many incremental backups to combine"); >> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), >> progname); >> + exit(1); >> + } >> + >> + IncrDirs[nIncrDir] = optarg; >> + nIncrDir++; >> + break; >> >> If the backup count increases providing the input may be difficult, >> Shall user provide all the incremental backups from a parent folder >> and can we handle the ordering of incremental backup internally >> >> Why we have that limit at first place? > > >> 7) >> + if (isPartialFile) >> + { >> + if (verbose) >> + pg_log_info("combining partial file \"%s.partial\"", fn); >> + >> + combine_partial_files(fn, IncrDirs, nIncrDir, subdirpath, outfn); >> + } >> + else >> + copy_whole_file(infn, outfn); >> >> Add verbose for copying whole file >> >> 8) We can also check if approximate space is available in disk before >> starting combine backup, this can be added as enhancement. We can get >> other's opinion on this. >> >> 9) >> + printf(_(" -i, --incr-backup=DIRECTORY incremental backup directory >> (maximum %d)\n"), MAX_INCR_BK_COUNT); >> + printf(_(" -o, --output-dir=DIRECTORY combine backup into >> directory\n")); >> + printf(_("\nGeneral options:\n")); >> + printf(_(" -n, --no-clean do not clean up after >> errors\n")); >> >> Combine backup into directory can be combine backup directory >> >> 10) >> +/* Max number of incremental backups to be combined. */ >> +#define MAX_INCR_BK_COUNT 10 >> + >> +/* magic number in incremental backup's .partial file */ >> >> MAX_INCR_BK_COUNT can be increased little, some applications use 1 >> full backup at the beginning of the month and use 30 incremental >> backups rest of the days in the month >> >> Regards, >> Vignesh >> EnterpriseDB: http://www.enterprisedb.com >> >> >> > > -- > Ibrar Ahmed > -- Ibrar Ahmed
Re: block-level incremental backup
I have not looked at the patch in detail, but just some nits from my side. On Fri, Aug 2, 2019 at 6:13 PM vignesh C wrote: > On Thu, Aug 1, 2019 at 5:06 PM Jeevan Chalke > wrote: > > > > On Tue, Jul 30, 2019 at 9:39 AM Jeevan Chalke < > jeevan.cha...@enterprisedb.com> wrote: > >> > >> I am almost done writing the patch for pg_combinebackup and will post > soon. > > > > > > Attached patch which implements the pg_combinebackup utility used to > combine > > full basebackup with one or more incremental backups. > > > > I have tested it manually and it works for all best cases. > > > > Let me know if you have any inputs/suggestions/review comments? > > > Some comments: > 1) There will be some link files created for tablespace, we might > require some special handling for it > > 2) > + while (numretries <= maxretries) > + { > + rc = system(copycmd); > + if (rc == 0) > + return; > > Use API to copy the file instead of "system", better to use the secure copy. > + pg_log_info("could not copy, retrying after %d seconds", > + sleeptime); > + pg_usleep(numretries++ * sleeptime * 100L); > + } > Retry functionality is hanlded only for copying of full files, should > we handle retry for copying of partial files > > The log and the sleep time does not match, you are multiplying sleeptime with numretries++ and logging only "sleeptime" Why we are retiring here, capture proper copy error and act accordingly. Blindly retiring does not make sense. 3) > + maxretries = atoi(optarg); > + if (maxretries < 0) > + { > + pg_log_error("invalid value for maxretries"); > + fprintf(stderr, _("%s: -r maxretries must be >= 0\n"), progname); > + exit(1); > + } > + break; > + case 's': > + sleeptime = atoi(optarg); > + if (sleeptime <= 0 || sleeptime > 60) > + { > + pg_log_error("invalid value for sleeptime"); > + fprintf(stderr, _("%s: -s sleeptime must be between 1 and 60\n"), > progname); > + exit(1); > + } > + break; > we can have some range for maxretries similar to sleeptime > > 4) > + fp = fopen(filename, "r"); > + if (fp == NULL) > + { > + pg_log_error("could not read file \"%s\": %m", filename); > + exit(1); > + } > + > + labelfile = malloc(statbuf.st_size + 1); > + if (fread(labelfile, 1, statbuf.st_size, fp) != statbuf.st_size) > + { > + pg_log_error("corrupted file \"%s\": %m", filename); > + free(labelfile); > + exit(1); > + } > Should we check for malloc failure > > Use pg_malloc instead of malloc > 5) Should we add display of progress as backup may take some time, > this can be added as enhancement. We can get other's opinion on this. > > Yes, we should, but this is not the right time to do that. > 6) > + if (nIncrDir == MAX_INCR_BK_COUNT) > + { > + pg_log_error("too many incremental backups to combine"); > + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), > progname); > + exit(1); > + } > + > + IncrDirs[nIncrDir] = optarg; > + nIncrDir++; > + break; > > If the backup count increases providing the input may be difficult, > Shall user provide all the incremental backups from a parent folder > and can we handle the ordering of incremental backup internally > > Why we have that limit at first place? > 7) > + if (isPartialFile) > + { > + if (verbose) > + pg_log_info("combining partial file \"%s.partial\"", fn); > + > + combine_partial_files(fn, IncrDirs, nIncrDir, subdirpath, outfn); > + } > + else > + copy_whole_file(infn, outfn); > > Add verbose for copying whole file > > 8) We can also check if approximate space is available in disk before > starting combine backup, this can be added as enhancement. We can get > other's opinion on this. > > 9) > + printf(_(" -i, --incr-backup=DIRECTORY incremental backup directory > (maximum %d)\n"), MAX_INCR_BK_COUNT); > + printf(_(" -o, --output-dir=DIRECTORY combine backup into > directory\n")); > + printf(_("\nGeneral options:\n")); > + printf(_(" -n, --no-clean do not clean up after > errors\n")); > > Combine backup into directory can be combine backup directory > > 10) > +/* Max number of incremental backups to be combined. */ > +#define MAX_INCR_BK_COUNT 10 > + > +/* magic number in incremental backup's .partial file */ > > MAX_INCR_BK_COUNT can be increased little, some applications use 1 > full backup at the beginning of the month and use 30 incremental > backups rest of the days in the month > > Regards, > Vignesh > EnterpriseDB: http://www.enterprisedb.com > > > -- Ibrar Ahmed
Re: block-level incremental backup
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Fri, Aug 2, 2019 at 9:13 AM vignesh C wrote: > > + rc = system(copycmd); > > I don't think this patch should be calling system() in the first place. +1. Thanks, Stephen signature.asc Description: PGP signature
Re: block-level incremental backup
On Fri, Aug 2, 2019 at 9:13 AM vignesh C wrote: > + rc = system(copycmd); I don't think this patch should be calling system() in the first place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Thu, Aug 1, 2019 at 5:06 PM Jeevan Chalke wrote: > > On Tue, Jul 30, 2019 at 9:39 AM Jeevan Chalke > wrote: >> >> I am almost done writing the patch for pg_combinebackup and will post soon. > > > Attached patch which implements the pg_combinebackup utility used to combine > full basebackup with one or more incremental backups. > > I have tested it manually and it works for all best cases. > > Let me know if you have any inputs/suggestions/review comments? > Some comments: 1) There will be some link files created for tablespace, we might require some special handling for it 2) + while (numretries <= maxretries) + { + rc = system(copycmd); + if (rc == 0) + return; + + pg_log_info("could not copy, retrying after %d seconds", + sleeptime); + pg_usleep(numretries++ * sleeptime * 100L); + } Retry functionality is hanlded only for copying of full files, should we handle retry for copying of partial files 3) + maxretries = atoi(optarg); + if (maxretries < 0) + { + pg_log_error("invalid value for maxretries"); + fprintf(stderr, _("%s: -r maxretries must be >= 0\n"), progname); + exit(1); + } + break; + case 's': + sleeptime = atoi(optarg); + if (sleeptime <= 0 || sleeptime > 60) + { + pg_log_error("invalid value for sleeptime"); + fprintf(stderr, _("%s: -s sleeptime must be between 1 and 60\n"), progname); + exit(1); + } + break; we can have some range for maxretries similar to sleeptime 4) + fp = fopen(filename, "r"); + if (fp == NULL) + { + pg_log_error("could not read file \"%s\": %m", filename); + exit(1); + } + + labelfile = malloc(statbuf.st_size + 1); + if (fread(labelfile, 1, statbuf.st_size, fp) != statbuf.st_size) + { + pg_log_error("corrupted file \"%s\": %m", filename); + free(labelfile); + exit(1); + } Should we check for malloc failure 5) Should we add display of progress as backup may take some time, this can be added as enhancement. We can get other's opinion on this. 6) + if (nIncrDir == MAX_INCR_BK_COUNT) + { + pg_log_error("too many incremental backups to combine"); + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); + exit(1); + } + + IncrDirs[nIncrDir] = optarg; + nIncrDir++; + break; If the backup count increases providing the input may be difficult, Shall user provide all the incremental backups from a parent folder and can we handle the ordering of incremental backup internally 7) + if (isPartialFile) + { + if (verbose) + pg_log_info("combining partial file \"%s.partial\"", fn); + + combine_partial_files(fn, IncrDirs, nIncrDir, subdirpath, outfn); + } + else + copy_whole_file(infn, outfn); Add verbose for copying whole file 8) We can also check if approximate space is available in disk before starting combine backup, this can be added as enhancement. We can get other's opinion on this. 9) + printf(_(" -i, --incr-backup=DIRECTORY incremental backup directory (maximum %d)\n"), MAX_INCR_BK_COUNT); + printf(_(" -o, --output-dir=DIRECTORY combine backup into directory\n")); + printf(_("\nGeneral options:\n")); + printf(_(" -n, --no-clean do not clean up after errors\n")); Combine backup into directory can be combine backup directory 10) +/* Max number of incremental backups to be combined. */ +#define MAX_INCR_BK_COUNT 10 + +/* magic number in incremental backup's .partial file */ MAX_INCR_BK_COUNT can be increased little, some applications use 1 full backup at the beginning of the month and use 30 incremental backups rest of the days in the month Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: block-level incremental backup
On Tue, Jul 30, 2019 at 9:39 AM Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote: > > > > I am almost done writing the patch for pg_combinebackup and will post soon. > Attached patch which implements the pg_combinebackup utility used to combine full basebackup with one or more incremental backups. I have tested it manually and it works for all best cases. Let me know if you have any inputs/suggestions/review comments? Thanks -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml index 8d91f35..f3e90b6 100644 --- a/doc/src/sgml/ref/allfiles.sgml +++ b/doc/src/sgml/ref/allfiles.sgml @@ -200,6 +200,7 @@ Complete list of usable sgml source files in this directory. + diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index 00782e0..92d9d13 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -415,7 +415,7 @@ PostgreSQL documentation which are modified after this given LSN will be backed up. The file which has these partial blocks has .partial as an extension. Backup taken in this manner has to be combined with the full backup with the -pg_combinebackup utility. + utility. diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml new file mode 100644 index 000..ed87931 --- /dev/null +++ b/doc/src/sgml/ref/pg_combinebackup.sgml @@ -0,0 +1,202 @@ + + + + + pg_combinebackup + + + + pg_combinebackup + 1 + Application + + + + pg_combinebackup + create a synthetic backup from a full backup and one or more incremental backups + + + + + pg_combinebackup + option + + + + + Description + + pg_combinebackup combines one or more incremental + backups with the full base-backup to generate a synthetic backup. + + + + + Options + + +The following command-line options are available: + + + + -f directory + --full-backup=directory + + +Specifies the directory where the full backup is stored. + + + + + + -i directory + --incr-backup=directory + + +Specifies the directory where the incremental backup is stored. + + + + + + -o directory + --output-backup=directory + + +Specifies the output directory where the combined full synthetic backup +to be stored. + + + + + + -n + --no-clean + + +By default, when pg_combinebackup aborts with an +error, it removes the output data directories it might have created +before discovering that it cannot finish the job. This option inhibits +tidying-up and is thus useful for debugging. + + + + + + -r max-retries + + +Max number of retries on copy command, with progressive wait. +Default is 3. + + + + + + -s interval + + +Sleep interval before retry (in seconds). +Should be in between 1 and 60, default is 5. + + + + + + -v + --verbose + + +Enable verbose output. Lists all partial files processed and its +checksum status. + + + + + + -V + --version + + +Print the pg_combinebackup version and exit. + + + + + + -? + --help + + + Show help about pg_combinebackup command line + arguments, and exit. + + + + + + + + + Environment + + +PG_COLOR + + + Specifies whether to use color in diagnostics messages. Possible values + are always, auto, + never. + + + + + + + + Notes + + Output directory, full backup directory, and at-least one incremental backup + directory must be specified. + + + + PREVIOUS WAL LOCATION of the incremental backup must + match with the START WAL LOCATION of the previous full + or incremental backup in a given sequence. + + + + + Examples + + + To combine a full backup with two incremental backups and store it in the + output directory: + +$ pg_combinebackup -f /data/full/data -i /data/incr/data1 -i /data/incr/data2 -o /data/full/fulldata + + + + + To combine a full backup with an incremental backups and store it in the + output directory along with verious options like, verbose, no-clean, and + maximum 3 retries: + +$ pg_combinebackup -v --no-clean -r 3 -f /data/full/data -i /data/incr/data1 -o /data/full/fulldata + + + + + + See Also + + + + + + + diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml index cef09dd..3513ab4 100644 --- a/do
Re: block-level incremental backup
On Wed, Jul 31, 2019 at 1:59 PM vignesh C wrote: > I feel Robert's suggestion is good. > We can probably keep one meta file for each backup with some basic information > of all the files being backed up, this metadata file will be useful in the > below case: > Table dropped before incremental backup > Table truncated and Insert/Update/Delete operations before incremental backup There's really no need for this with the design I proposed. The files that should exist when you restore in incremental backup are exactly the set of files that exist in the final incremental backup, except that any .partial files need to be replaced with a correct reconstruction of the underlying file. You don't need to know what got dropped or truncated; you only need to know what's supposed to be there at the end. You may be thinking, as I once did, that restoring an incremental backup would consist of restoring the full backup first and then layering the incrementals over it, but if you read what I proposed, it actually works the other way around: you restore the files that are present in the incremental, and as needed, pull pieces of them from earlier incremental and/or full backups. I think this is a *much* better design than doing it the other way; it avoids any risk of getting the wrong answer due to truncations or drops, and it also is faster, because you only read older backups to the extent that you actually need their contents. I think it's a good idea to try to keep all the information about a single file being backup in one place. It's just less confusing. If, for example, you have a metadata file that tells you which files are dropped - that is, which files you DON'T have - then what happen if one of those files is present in the data directory after all? Well, then you have inconsistent information and are confused, and maybe your code won't even notice the inconsistency. Similarly, if the metadata file is separate from the block data, then what happens if one file is missing, or isn't from the same backup as the other file? That shouldn't happen, of course, but if it does, you'll get confused. There's no perfect solution to these kinds of problems: if we suppose that the backup can be corrupted by having missing or extra files, why not also corruption within a single file? Still, on balance I tend to think that keeping related stuff together minimizes the surface area for bugs. I realize that's arguable, though. One consideration that goes the other way: if you have a manifest file that says what files are supposed to be present in the backup, then you can detect a disappearing file, which is impossible with the design I've proposed (and with the current full backup machinery). That might be worth fixing, but it's a separate feature that has little to do with incremental backup. > Probably it can also help us to decide which work the worker needs to do > if we are planning to backup in parallel. I don't think we need a manifest file for parallel backup. One process or thread can scan the directory tree, make a list of which files are present, and then hand individual files off to other processes or threads. In short, the directory listing serves as the manifest. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Tue, Jul 30, 2019 at 1:58 AM Robert Haas wrote: > > On Wed, Jul 10, 2019 at 2:17 PM Anastasia Lubennikova > wrote: > > In attachments, you can find a prototype of incremental pg_basebackup, > > which consists of 2 features: > > > > 1) To perform incremental backup one should call pg_basebackup with a > > new argument: > > > > pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn' > > > > where lsn is a start_lsn of parent backup (can be found in > > "backup_label" file) > > > > It calls BASE_BACKUP replication command with a new argument > > PREV_BACKUP_START_LSN 'lsn'. > > > > For datafiles, only pages with LSN > prev_backup_start_lsn will be > > included in the backup. >> One thought, if the file is not modified no need to check the lsn. >> > > They are saved into 'filename.partial' file, 'filename.blockmap' file > > contains an array of BlockNumbers. > > For example, if we backuped blocks 1,3,5, filename.partial will contain > > 3 blocks, and 'filename.blockmap' will contain array {1,3,5}. > > I think it's better to keep both the information about changed blocks > and the contents of the changed blocks in a single file. The list of > changed blocks is probably quite short, and I don't really want to > double the number of files in the backup if there's no real need. I > suspect it's just overall a bit simpler to keep everything together. > I don't think this is a make-or-break thing, and welcome contrary > arguments, but that's my preference. > I feel Robert's suggestion is good. We can probably keep one meta file for each backup with some basic information of all the files being backed up, this metadata file will be useful in the below case: Table dropped before incremental backup Table truncated and Insert/Update/Delete operations before incremental backup I feel if we have the metadata, we can add some optimization to decide the above scenario with the metadata information to identify the file deletion and avoiding write and delete for pg_combinebackup which Jeevan has told in his previous mail. Probably it can also help us to decide which work the worker needs to do if we are planning to backup in parallel. Regards, vignesh EnterpriseDB: http://www.enterprisedb.com
Re: block-level incremental backup
On Tue, Jul 30, 2019 at 1:28 AM Robert Haas wrote: > On Wed, Jul 10, 2019 at 2:17 PM Anastasia Lubennikova > wrote: > > In attachments, you can find a prototype of incremental pg_basebackup, > > which consists of 2 features: > > > > 1) To perform incremental backup one should call pg_basebackup with a > > new argument: > > > > pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn' > > > > where lsn is a start_lsn of parent backup (can be found in > > "backup_label" file) > > > > It calls BASE_BACKUP replication command with a new argument > > PREV_BACKUP_START_LSN 'lsn'. > > > > For datafiles, only pages with LSN > prev_backup_start_lsn will be > > included in the backup. > > They are saved into 'filename.partial' file, 'filename.blockmap' file > > contains an array of BlockNumbers. > > For example, if we backuped blocks 1,3,5, filename.partial will contain > > 3 blocks, and 'filename.blockmap' will contain array {1,3,5}. > > I think it's better to keep both the information about changed blocks > and the contents of the changed blocks in a single file. The list of > changed blocks is probably quite short, and I don't really want to > double the number of files in the backup if there's no real need. I > suspect it's just overall a bit simpler to keep everything together. > I don't think this is a make-or-break thing, and welcome contrary > arguments, but that's my preference. > I had experience working on a similar product and I agree with Robert to keep the changed block info and the changed block in a single file make more sense. +1 > > > 2) To merge incremental backup into a full backup call > > > > pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir' > > --merge-backups > > > > It will move all files from 'incremental_basedir' to 'basedir' handling > > '.partial' files correctly. > > This, to me, looks like it's much worse than the design that I > proposed originally. It means that: > > 1. You can't take an incremental backup without having the full backup > available at the time you want to take the incremental backup. > > 2. You're always storing a full backup, which means that you need more > disk space, and potentially much more I/O while taking the backup. > You save on transfer bandwidth, but you add a lot of disk reads and > writes, costs which have to be paid even if the backup is never > restored. > > > 1) Whether we collect block maps using simple "read everything page by > > page" approach > > or WAL scanning or any other page tracking algorithm, we must choose a > > map format. > > I implemented the simplest one, while there are more ideas: > > I think we should start simple. > > I haven't had a chance to look at Jeevan's patch at all, or yours in > any detail, as yet, so these are just some very preliminary comments. > It will be good, however, if we can agree on who is going to do what > part of this as we try to drive this forward together. I'm sorry that > I didn't communicate EDB's plans to work on this more clearly; > duplicated effort serves nobody well. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- Ibrar Ahmed
Re: block-level incremental backup
On Tue, Jul 30, 2019 at 1:58 AM Robert Haas wrote: > > I haven't had a chance to look at Jeevan's patch at all, or yours in > any detail, as yet, so these are just some very preliminary comments. > It will be good, however, if we can agree on who is going to do what > part of this as we try to drive this forward together. I'm sorry that > I didn't communicate EDB's plans to work on this more clearly; > duplicated effort serves nobody well. > I had a look over Anastasia's PoC patch to understand the approach she has taken and here are my observations. 1. The patch first creates a .blockmap file for each relation file containing an array of all modified block numbers. This is done by reading all blocks (in a chunk of 4 (32kb in total) in a loop) from a file and checking the page LSN with given LSN. Later, to create .partial file, a relation file is opened again and all blocks are read in a chunk of 4 in a loop. If found modified, it is copied into another memory and after scanning all 4 blocks, all copied blocks are sent to the .partial file. In this approach, each file is opened and read twice which looks more expensive to me. Whereas in my patch, I do that just once. However, I read the entire file in memory to check which blocks are modified but in Anastasia's design max TAR_SEND_SIZE (32kb) will be read at a time but, in a loop. I need to do that as we wanted to know how heavily the file got modified so that we can send the entire file if it was modified beyond the threshold (currently 90%). 2. Also, while sending modified blocks, they are copied in another buffer, instead they can be just sent from the read files contents (in BLCKSZ block size). Here, the .blockmap created earlier was not used. In my implementation, we are sending just a .partial file with a header containing all required details like the number of blocks changes along with the block numbers including CRC followed by the blocks itself. 3. I tried compiling Anastasia's patch, but getting an error. So could not see or test how it goes. Also, like a normal backup option, the incremental backup option needs to verify the checksum if requested. 4. While combining full and incremental backup, files from the incremental backup are just copied into the full backup directory. While the design I posted earlier, we are trying another way round to avoid over-writing and other issues as I explained earlier. I am almost done writing the patch for pg_combinebackup and will post soon. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > Thanks -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: block-level incremental backup
Hi Jeevan I reviewed first two patches - 0001-Add-support-for-command-line-option-to-pass-LSN.patch and 0002-Add-TAP-test-to-test-LSN-option.patch from the set of incremental backup patches, and the changes look good to me. I had some concerns around the way we are working around with the fact that pg_lsn_in() accepts the lsn with 0 as a valid lsn and I think that itself is contradictory to the definition of InvalidXLogRecPtr. I have started a separate new thread[1] for the same. Also, I observe that now commit 21f428eb, has already moved the lsn decoding logic to a separate function pg_lsn_in_internal(), so the function decode_lsn_internal() from patch 0001 will go away and the dependent code needs to be modified. I shall review the rest of the patches, and post the comments. Regards, Jeevan Ladhe [1] https://www.postgresql.org/message-id/CAOgcT0NOM9oR0Hag_3VpyW0uF3iCU=bdufspfk9jrwxrcwq...@mail.gmail.com On Thu, Jul 11, 2019 at 5:00 PM Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote: > Hi Anastasia, > > On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova < > a.lubennik...@postgrespro.ru> wrote: > >> 23.04.2019 14:08, Anastasia Lubennikova wrote: >> > I'm volunteering to write a draft patch or, more likely, set of >> > patches, which >> > will allow us to discuss the subject in more detail. >> > And to do that I wish we agree on the API and data format (at least >> > broadly). >> > Looking forward to hearing your thoughts. >> >> Though the previous discussion stalled, >> I still hope that we could agree on basic points such as a map file >> format and protocol extension, >> which is necessary to start implementing the feature. >> > > It's great that you too come up with the PoC patch. I didn't look at your > changes in much details but we at EnterpriseDB too working on this feature > and started implementing it. > > Attached series of patches I had so far... (which needed further > optimization and adjustments though) > > Here is the overall design (as proposed by Robert) we are trying to > implement: > > 1. Extend the BASE_BACKUP command that can be used with replication > connections. Add a new [ LSN 'lsn' ] option. > > 2. Extend pg_basebackup with a new --lsn=LSN option that causes it to send > the option added to the server in #1. > > Here are the implementation details when we have a valid LSN > > sendFile() in basebackup.c is the function which mostly does the thing for > us. If the filename looks like a relation file, then we'll need to consider > sending only a partial file. The way to do that is probably: > > A. Read the whole file into memory. > > B. Check the LSN of each block. Build a bitmap indicating which blocks > have an LSN greater than or equal to the threshold LSN. > > C. If more than 90% of the bits in the bitmap are set, send the whole file > just as if this were a full backup. This 90% is a constant now; we might > make it a GUC later. > > D. Otherwise, send a file with .partial added to the name. The .partial > file contains an indication of which blocks were changed at the beginning, > followed by the data blocks. It also includes a checksum/CRC. > Currently, a .partial file format looks like: > - start with a 4-byte magic number > - then store a 4-byte CRC covering the header > - then a 4-byte count of the number of blocks included in the file > - then the block numbers, each as a 4-byte quantity > - then the data blocks > > > We are also working on combining these incremental back-ups with the full > backup and for that, we are planning to add a new utility called > pg_combinebackup. Will post the details on that later once we have on the > same page for taking backup. > > Thanks > -- > Jeevan Chalke > Technical Architect, Product Development > EnterpriseDB Corporation > >
Re: block-level incremental backup
On Wed, Jul 10, 2019 at 2:17 PM Anastasia Lubennikova wrote: > In attachments, you can find a prototype of incremental pg_basebackup, > which consists of 2 features: > > 1) To perform incremental backup one should call pg_basebackup with a > new argument: > > pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn' > > where lsn is a start_lsn of parent backup (can be found in > "backup_label" file) > > It calls BASE_BACKUP replication command with a new argument > PREV_BACKUP_START_LSN 'lsn'. > > For datafiles, only pages with LSN > prev_backup_start_lsn will be > included in the backup. > They are saved into 'filename.partial' file, 'filename.blockmap' file > contains an array of BlockNumbers. > For example, if we backuped blocks 1,3,5, filename.partial will contain > 3 blocks, and 'filename.blockmap' will contain array {1,3,5}. I think it's better to keep both the information about changed blocks and the contents of the changed blocks in a single file. The list of changed blocks is probably quite short, and I don't really want to double the number of files in the backup if there's no real need. I suspect it's just overall a bit simpler to keep everything together. I don't think this is a make-or-break thing, and welcome contrary arguments, but that's my preference. > 2) To merge incremental backup into a full backup call > > pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir' > --merge-backups > > It will move all files from 'incremental_basedir' to 'basedir' handling > '.partial' files correctly. This, to me, looks like it's much worse than the design that I proposed originally. It means that: 1. You can't take an incremental backup without having the full backup available at the time you want to take the incremental backup. 2. You're always storing a full backup, which means that you need more disk space, and potentially much more I/O while taking the backup. You save on transfer bandwidth, but you add a lot of disk reads and writes, costs which have to be paid even if the backup is never restored. > 1) Whether we collect block maps using simple "read everything page by > page" approach > or WAL scanning or any other page tracking algorithm, we must choose a > map format. > I implemented the simplest one, while there are more ideas: I think we should start simple. I haven't had a chance to look at Jeevan's patch at all, or yours in any detail, as yet, so these are just some very preliminary comments. It will be good, however, if we can agree on who is going to do what part of this as we try to drive this forward together. I'm sorry that I didn't communicate EDB's plans to work on this more clearly; duplicated effort serves nobody well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
On Fri, Jul 26, 2019 at 11:21 AM Jeevan Ladhe wrote: > Hi Vignesh, > > Please find my comments inline below: > > 1) If relation file has changed due to truncate or vacuum. >> During incremental backup the new files will be copied. >> There are chances that both the old file and new file >> will be present. I'm not sure if cleaning up of the >> old file is handled. >> > > When an incremental backup is taken it either copies the file in its > entirety if > a file is changed more than 90%, or writes .partial with changed blocks > bitmap > and actual data. For the files that are unchanged, it writes 0 bytes and > still > creates a .partial file for unchanged files too. This means there is a > .partitial > file for all the files that are to be looked up in full backup. > While composing a synthetic backup from incremental backup the > pg_combinebackup > tool will only look for those relation files in full(parent) backup which > are > having .partial files in the incremental backup. So, if vacuum/truncate > happened > between full and incremental backup, then the incremental backup image > will not > have a 0-length .partial file for that relation, and so the synthetic > backup > that is restored using pg_combinebackup will not have that file as well. > > Thanks Jeevan for the update, I feel this logic is good. It will handle the case of deleting the old relation files. > > >> 2) Just a small thought on building the bitmap, >> can the bitmap be built and maintained as >> and when the changes are happening in the system. >> If we are building the bitmap while doing the incremental backup, >> Scanning through each file might take more time. >> This can be a configurable parameter, the system can run >> without capturing this information by default, but if there are some >> of them who will be taking incremental backup frequently this >> configuration can be enabled which should track the modified blocks. > > > IIUC, this will need changes in the backend. Honestly, I think backup is a > maintenance task and hampering the backend for this does not look like a > good > idea. But, having said that even if we have to provide this as a switch > for some > of the users, it will need a different infrastructure than what we are > building > here for constructing bitmap, where we scan all the files one by one. > Maybe for > the initial version, we can go with the current proposal that Robert has > suggested, > and add this switch at a later point as an enhancement. > > That sounds fair to me. Regards, vignesh EnterpriseDB: http://www.enterprisedb.com
Re: block-level incremental backup
Hi Vignesh, Please find my comments inline below: 1) If relation file has changed due to truncate or vacuum. > During incremental backup the new files will be copied. > There are chances that both the old file and new file > will be present. I'm not sure if cleaning up of the > old file is handled. > When an incremental backup is taken it either copies the file in its entirety if a file is changed more than 90%, or writes .partial with changed blocks bitmap and actual data. For the files that are unchanged, it writes 0 bytes and still creates a .partial file for unchanged files too. This means there is a .partitial file for all the files that are to be looked up in full backup. While composing a synthetic backup from incremental backup the pg_combinebackup tool will only look for those relation files in full(parent) backup which are having .partial files in the incremental backup. So, if vacuum/truncate happened between full and incremental backup, then the incremental backup image will not have a 0-length .partial file for that relation, and so the synthetic backup that is restored using pg_combinebackup will not have that file as well. > 2) Just a small thought on building the bitmap, > can the bitmap be built and maintained as > and when the changes are happening in the system. > If we are building the bitmap while doing the incremental backup, > Scanning through each file might take more time. > This can be a configurable parameter, the system can run > without capturing this information by default, but if there are some > of them who will be taking incremental backup frequently this > configuration can be enabled which should track the modified blocks. IIUC, this will need changes in the backend. Honestly, I think backup is a maintenance task and hampering the backend for this does not look like a good idea. But, having said that even if we have to provide this as a switch for some of the users, it will need a different infrastructure than what we are building here for constructing bitmap, where we scan all the files one by one. Maybe for the initial version, we can go with the current proposal that Robert has suggested, and add this switch at a later point as an enhancement. - My thoughts. Regards, Jeevan Ladhe
Re: block-level incremental backup
Thanks Jeevan. 1) If relation file has changed due to truncate or vacuum. During incremental backup the new files will be copied. There are chances that both the old file and new file will be present. I'm not sure if cleaning up of the old file is handled. 2) Just a small thought on building the bitmap, can the bitmap be built and maintained as and when the changes are happening in the system. If we are building the bitmap while doing the incremental backup, Scanning through each file might take more time. This can be a configurable parameter, the system can run without capturing this information by default, but if there are some of them who will be taking incremental backup frequently this configuration can be enabled which should track the modified blocks. What is your thought on this? -- Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com On Tue, Jul 23, 2019 at 11:19 PM Jeevan Ladhe wrote: > > Hi Vignesh, > > This backup technology is extending the pg_basebackup itself, which means we > can > still take online backups. This is internally done using pg_start_backup and > pg_stop_backup. pg_start_backup performs a checkpoint, and this checkpoint is > used in the recovery process while starting the cluster from a backup image. > What > incremental backup will just modify (as compared to traditional pg_basebackup) > is - After doing the checkpoint, instead of copying the entire relation files, > it takes an input LSN and scan all the blocks in all relation files, and store > the blocks having LSN >= InputLSN. This means it considers all the changes > that are already written into relation files including insert/update/delete > etc > up to the checkpoint performed by pg_start_backup internally, and as Jeevan > Chalke > mentioned upthread the incremental backup will also contain copy of WAL files. > Once this incremental backup is combined with the parent backup by means of > new > combine process (that will be introduced as part of this feature itself) > should > ideally look like a full pg_basebackup. Note that any changes done by these > insert/delete/update operations while the incremental backup was being taken > will be still available via WAL files and as normal restore process, will be > replayed from the checkpoint onwards up to a consistent point. > > My two cents! > > Regards, > Jeevan Ladhe > > On Sat, Jul 20, 2019 at 11:22 PM vignesh C wrote: >> >> Hi Jeevan, >> >> The idea is very nice. >> When Insert/update/delete and truncate/drop happens at various >> combinations, How the incremental backup handles the copying of the >> blocks? >> >> >> On Wed, Jul 17, 2019 at 8:12 PM Jeevan Chalke >> wrote: >> > >> > >> > >> > On Wed, Jul 17, 2019 at 7:38 PM Ibrar Ahmed wrote: >> >> >> >> >> >> >> >> On Wed, Jul 17, 2019 at 6:43 PM Jeevan Chalke >> >> wrote: >> >>> >> >>> On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed >> >>> wrote: >> >> >> At what stage you will apply the WAL generated in between the >> START/STOP backup. >> >>> >> >>> >> >>> In this design, we are not touching any WAL related code. The WAL files >> >>> will >> >>> get copied with each backup either full or incremental. And thus, the >> >>> last >> >>> incremental backup will have the final WAL files which will be copied >> >>> as-is >> >>> in the combined full-backup and they will get apply automatically if that >> >>> the data directory is used to start the server. >> >> >> >> >> >> Ok, so you keep all the WAL files since the first backup, right? >> > >> > >> > The WAL files will anyway be copied while taking a backup (full or >> > incremental), >> > but only last incremental backup's WAL files are copied to the combined >> > synthetic full backup. >> > >> >>> >> >> -- >> Ibrar Ahmed >> >>> >> >>> >> >>> -- >> >>> Jeevan Chalke >> >>> Technical Architect, Product Development >> >>> EnterpriseDB Corporation >> >>> >> >> >> >> >> >> -- >> >> Ibrar Ahmed >> > >> > >> > >> > -- >> > Jeevan Chalke >> > Technical Architect, Product Development >> > EnterpriseDB Corporation >> > >> >> >> -- >> Regards, >> vignesh >> >> >>
Re: block-level incremental backup
Hi Vignesh, This backup technology is extending the pg_basebackup itself, which means we can still take online backups. This is internally done using pg_start_backup and pg_stop_backup. pg_start_backup performs a checkpoint, and this checkpoint is used in the recovery process while starting the cluster from a backup image. What incremental backup will just modify (as compared to traditional pg_basebackup) is - After doing the checkpoint, instead of copying the entire relation files, it takes an input LSN and scan all the blocks in all relation files, and store the blocks having LSN >= InputLSN. This means it considers all the changes that are already written into relation files including insert/update/delete etc up to the checkpoint performed by pg_start_backup internally, and as Jeevan Chalke mentioned upthread the incremental backup will also contain copy of WAL files. Once this incremental backup is combined with the parent backup by means of new combine process (that will be introduced as part of this feature itself) should ideally look like a full pg_basebackup. Note that any changes done by these insert/delete/update operations while the incremental backup was being taken will be still available via WAL files and as normal restore process, will be replayed from the checkpoint onwards up to a consistent point. My two cents! Regards, Jeevan Ladhe On Sat, Jul 20, 2019 at 11:22 PM vignesh C wrote: > Hi Jeevan, > > The idea is very nice. > When Insert/update/delete and truncate/drop happens at various > combinations, How the incremental backup handles the copying of the > blocks? > > > On Wed, Jul 17, 2019 at 8:12 PM Jeevan Chalke > wrote: > > > > > > > > On Wed, Jul 17, 2019 at 7:38 PM Ibrar Ahmed > wrote: > >> > >> > >> > >> On Wed, Jul 17, 2019 at 6:43 PM Jeevan Chalke < > jeevan.cha...@enterprisedb.com> wrote: > >>> > >>> On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed > wrote: > > > At what stage you will apply the WAL generated in between the > START/STOP backup. > >>> > >>> > >>> In this design, we are not touching any WAL related code. The WAL > files will > >>> get copied with each backup either full or incremental. And thus, the > last > >>> incremental backup will have the final WAL files which will be copied > as-is > >>> in the combined full-backup and they will get apply automatically if > that > >>> the data directory is used to start the server. > >> > >> > >> Ok, so you keep all the WAL files since the first backup, right? > > > > > > The WAL files will anyway be copied while taking a backup (full or > incremental), > > but only last incremental backup's WAL files are copied to the combined > > synthetic full backup. > > > >>> > > -- > Ibrar Ahmed > >>> > >>> > >>> -- > >>> Jeevan Chalke > >>> Technical Architect, Product Development > >>> EnterpriseDB Corporation > >>> > >> > >> > >> -- > >> Ibrar Ahmed > > > > > > > > -- > > Jeevan Chalke > > Technical Architect, Product Development > > EnterpriseDB Corporation > > > > > -- > Regards, > vignesh > EnterpriseDB: http://www.enterprisedb.com > > >
Re: block-level incremental backup
Hi Jeevan, The idea is very nice. When Insert/update/delete and truncate/drop happens at various combinations, How the incremental backup handles the copying of the blocks? On Wed, Jul 17, 2019 at 8:12 PM Jeevan Chalke wrote: > > > > On Wed, Jul 17, 2019 at 7:38 PM Ibrar Ahmed wrote: >> >> >> >> On Wed, Jul 17, 2019 at 6:43 PM Jeevan Chalke >> wrote: >>> >>> On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed wrote: At what stage you will apply the WAL generated in between the START/STOP backup. >>> >>> >>> In this design, we are not touching any WAL related code. The WAL files will >>> get copied with each backup either full or incremental. And thus, the last >>> incremental backup will have the final WAL files which will be copied as-is >>> in the combined full-backup and they will get apply automatically if that >>> the data directory is used to start the server. >> >> >> Ok, so you keep all the WAL files since the first backup, right? > > > The WAL files will anyway be copied while taking a backup (full or > incremental), > but only last incremental backup's WAL files are copied to the combined > synthetic full backup. > >>> -- Ibrar Ahmed >>> >>> >>> -- >>> Jeevan Chalke >>> Technical Architect, Product Development >>> EnterpriseDB Corporation >>> >> >> >> -- >> Ibrar Ahmed > > > > -- > Jeevan Chalke > Technical Architect, Product Development > EnterpriseDB Corporation > -- Regards, vignesh EnterpriseDB: http://www.enterprisedb.com
Re: block-level incremental backup
On Wed, Jul 17, 2019 at 7:38 PM Ibrar Ahmed wrote: > > > On Wed, Jul 17, 2019 at 6:43 PM Jeevan Chalke < > jeevan.cha...@enterprisedb.com> wrote: > >> On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed >> wrote: >> >>> >>> At what stage you will apply the WAL generated in between the START/STOP >>> backup. >>> >> >> In this design, we are not touching any WAL related code. The WAL files >> will >> get copied with each backup either full or incremental. And thus, the last >> incremental backup will have the final WAL files which will be copied >> as-is >> in the combined full-backup and they will get apply automatically if that >> the data directory is used to start the server. >> > > Ok, so you keep all the WAL files since the first backup, right? > The WAL files will anyway be copied while taking a backup (full or incremental), but only last incremental backup's WAL files are copied to the combined synthetic full backup. >> >>> -- >>> Ibrar Ahmed >>> >> >> -- >> Jeevan Chalke >> Technical Architect, Product Development >> EnterpriseDB Corporation >> >> > > -- > Ibrar Ahmed > -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation
Re: block-level incremental backup
On Wed, Jul 17, 2019 at 6:43 PM Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote: > On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed wrote: > >> >> At what stage you will apply the WAL generated in between the START/STOP >> backup. >> > > In this design, we are not touching any WAL related code. The WAL files > will > get copied with each backup either full or incremental. And thus, the last > incremental backup will have the final WAL files which will be copied as-is > in the combined full-backup and they will get apply automatically if that > the data directory is used to start the server. > Ok, so you keep all the WAL files since the first backup, right? > > >> -- >> Ibrar Ahmed >> > > -- > Jeevan Chalke > Technical Architect, Product Development > EnterpriseDB Corporation > > -- Ibrar Ahmed
Re: block-level incremental backup
On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed wrote: > > At what stage you will apply the WAL generated in between the START/STOP > backup. > In this design, we are not touching any WAL related code. The WAL files will get copied with each backup either full or incremental. And thus, the last incremental backup will have the final WAL files which will be copied as-is in the combined full-backup and they will get apply automatically if that the data directory is used to start the server. > -- > Ibrar Ahmed > -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation
Re: block-level incremental backup
On Wed, Jul 17, 2019 at 10:22 AM Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote: > > > On Thu, Jul 11, 2019 at 5:00 PM Jeevan Chalke < > jeevan.cha...@enterprisedb.com> wrote: > >> Hi Anastasia, >> >> On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova < >> a.lubennik...@postgrespro.ru> wrote: >> >>> 23.04.2019 14:08, Anastasia Lubennikova wrote: >>> > I'm volunteering to write a draft patch or, more likely, set of >>> > patches, which >>> > will allow us to discuss the subject in more detail. >>> > And to do that I wish we agree on the API and data format (at least >>> > broadly). >>> > Looking forward to hearing your thoughts. >>> >>> Though the previous discussion stalled, >>> I still hope that we could agree on basic points such as a map file >>> format and protocol extension, >>> which is necessary to start implementing the feature. >>> >> >> It's great that you too come up with the PoC patch. I didn't look at your >> changes in much details but we at EnterpriseDB too working on this feature >> and started implementing it. >> >> Attached series of patches I had so far... (which needed further >> optimization and adjustments though) >> >> Here is the overall design (as proposed by Robert) we are trying to >> implement: >> >> 1. Extend the BASE_BACKUP command that can be used with replication >> connections. Add a new [ LSN 'lsn' ] option. >> >> 2. Extend pg_basebackup with a new --lsn=LSN option that causes it to >> send the option added to the server in #1. >> >> Here are the implementation details when we have a valid LSN >> >> sendFile() in basebackup.c is the function which mostly does the thing >> for us. If the filename looks like a relation file, then we'll need to >> consider sending only a partial file. The way to do that is probably: >> >> A. Read the whole file into memory. >> >> B. Check the LSN of each block. Build a bitmap indicating which blocks >> have an LSN greater than or equal to the threshold LSN. >> >> C. If more than 90% of the bits in the bitmap are set, send the whole >> file just as if this were a full backup. This 90% is a constant now; we >> might make it a GUC later. >> >> D. Otherwise, send a file with .partial added to the name. The .partial >> file contains an indication of which blocks were changed at the beginning, >> followed by the data blocks. It also includes a checksum/CRC. >> Currently, a .partial file format looks like: >> - start with a 4-byte magic number >> - then store a 4-byte CRC covering the header >> - then a 4-byte count of the number of blocks included in the file >> - then the block numbers, each as a 4-byte quantity >> - then the data blocks >> >> >> We are also working on combining these incremental back-ups with the full >> backup and for that, we are planning to add a new utility called >> pg_combinebackup. Will post the details on that later once we have on the >> same page for taking backup. >> > > For combining a full backup with one or more incremental backup, we are > adding > a new utility called pg_combinebackup in src/bin. > > Here is the overall design as proposed by Robert. > > pg_combinebackup starts from the LAST backup specified and work backward. > It > must NOT start with the full backup and work forward. This is important > both > for reasons of efficiency and of correctness. For example, if you start by > copying over the full backup and then later apply the incremental backups > on > top of it then you'll copy data and later end up overwriting it or removing > it. Any files that are leftover at the end that aren't in the final > incremental backup even as .partial files need to be removed, or the > result is > wrong. We should aim for a system where every block in the output > directory is > written exactly once and nothing ever has to be created and then removed. > > To make that work, we should start by examining the final incremental > backup. > We should proceed with one file at a time. For each file: > > 1. If the complete file is present in the incremental backup, then just > copy it > to the output directory - and move on to the next file. > > 2. Otherwise, we have a .partial file. Work backward through the backup > chain > until we find a complete version of the file. That might happen when we get > \back to the full backup at the start of the chain, but it might also > happen > sooner - at which point we do not need to and should not look at earlier > backups for that file. During this phase, we should read only the HEADER of > each .partial file, building a map of which blocks we're ultimately going > to > need to read from each backup. We can also compute the offset within each > file > where that block is stored at this stage, again using the header > information. > > 3. Now, we can write the output file - reading each block in turn from the > correct backup and writing it to the write output file, using the map we > constructed in the previous step. We should probably keep all of the input > files ope
Re: block-level incremental backup
On Thu, Jul 11, 2019 at 5:00 PM Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote: > Hi Anastasia, > > On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova < > a.lubennik...@postgrespro.ru> wrote: > >> 23.04.2019 14:08, Anastasia Lubennikova wrote: >> > I'm volunteering to write a draft patch or, more likely, set of >> > patches, which >> > will allow us to discuss the subject in more detail. >> > And to do that I wish we agree on the API and data format (at least >> > broadly). >> > Looking forward to hearing your thoughts. >> >> Though the previous discussion stalled, >> I still hope that we could agree on basic points such as a map file >> format and protocol extension, >> which is necessary to start implementing the feature. >> > > It's great that you too come up with the PoC patch. I didn't look at your > changes in much details but we at EnterpriseDB too working on this feature > and started implementing it. > > Attached series of patches I had so far... (which needed further > optimization and adjustments though) > > Here is the overall design (as proposed by Robert) we are trying to > implement: > > 1. Extend the BASE_BACKUP command that can be used with replication > connections. Add a new [ LSN 'lsn' ] option. > > 2. Extend pg_basebackup with a new --lsn=LSN option that causes it to send > the option added to the server in #1. > > Here are the implementation details when we have a valid LSN > > sendFile() in basebackup.c is the function which mostly does the thing for > us. If the filename looks like a relation file, then we'll need to consider > sending only a partial file. The way to do that is probably: > > A. Read the whole file into memory. > > B. Check the LSN of each block. Build a bitmap indicating which blocks > have an LSN greater than or equal to the threshold LSN. > > C. If more than 90% of the bits in the bitmap are set, send the whole file > just as if this were a full backup. This 90% is a constant now; we might > make it a GUC later. > > D. Otherwise, send a file with .partial added to the name. The .partial > file contains an indication of which blocks were changed at the beginning, > followed by the data blocks. It also includes a checksum/CRC. > Currently, a .partial file format looks like: > - start with a 4-byte magic number > - then store a 4-byte CRC covering the header > - then a 4-byte count of the number of blocks included in the file > - then the block numbers, each as a 4-byte quantity > - then the data blocks > > > We are also working on combining these incremental back-ups with the full > backup and for that, we are planning to add a new utility called > pg_combinebackup. Will post the details on that later once we have on the > same page for taking backup. > For combining a full backup with one or more incremental backup, we are adding a new utility called pg_combinebackup in src/bin. Here is the overall design as proposed by Robert. pg_combinebackup starts from the LAST backup specified and work backward. It must NOT start with the full backup and work forward. This is important both for reasons of efficiency and of correctness. For example, if you start by copying over the full backup and then later apply the incremental backups on top of it then you'll copy data and later end up overwriting it or removing it. Any files that are leftover at the end that aren't in the final incremental backup even as .partial files need to be removed, or the result is wrong. We should aim for a system where every block in the output directory is written exactly once and nothing ever has to be created and then removed. To make that work, we should start by examining the final incremental backup. We should proceed with one file at a time. For each file: 1. If the complete file is present in the incremental backup, then just copy it to the output directory - and move on to the next file. 2. Otherwise, we have a .partial file. Work backward through the backup chain until we find a complete version of the file. That might happen when we get \back to the full backup at the start of the chain, but it might also happen sooner - at which point we do not need to and should not look at earlier backups for that file. During this phase, we should read only the HEADER of each .partial file, building a map of which blocks we're ultimately going to need to read from each backup. We can also compute the offset within each file where that block is stored at this stage, again using the header information. 3. Now, we can write the output file - reading each block in turn from the correct backup and writing it to the write output file, using the map we constructed in the previous step. We should probably keep all of the input files open over steps 2 and 3 and then close them at the end because repeatedly closing and opening them is going to be expensive. When that's done, go on to the next file and start over at step 1. We are already started working on this design. -- Jeevan Cha
Re: block-level incremental backup
Hi Anastasia, On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova < a.lubennik...@postgrespro.ru> wrote: > 23.04.2019 14:08, Anastasia Lubennikova wrote: > > I'm volunteering to write a draft patch or, more likely, set of > > patches, which > > will allow us to discuss the subject in more detail. > > And to do that I wish we agree on the API and data format (at least > > broadly). > > Looking forward to hearing your thoughts. > > Though the previous discussion stalled, > I still hope that we could agree on basic points such as a map file > format and protocol extension, > which is necessary to start implementing the feature. > It's great that you too come up with the PoC patch. I didn't look at your changes in much details but we at EnterpriseDB too working on this feature and started implementing it. Attached series of patches I had so far... (which needed further optimization and adjustments though) Here is the overall design (as proposed by Robert) we are trying to implement: 1. Extend the BASE_BACKUP command that can be used with replication connections. Add a new [ LSN 'lsn' ] option. 2. Extend pg_basebackup with a new --lsn=LSN option that causes it to send the option added to the server in #1. Here are the implementation details when we have a valid LSN sendFile() in basebackup.c is the function which mostly does the thing for us. If the filename looks like a relation file, then we'll need to consider sending only a partial file. The way to do that is probably: A. Read the whole file into memory. B. Check the LSN of each block. Build a bitmap indicating which blocks have an LSN greater than or equal to the threshold LSN. C. If more than 90% of the bits in the bitmap are set, send the whole file just as if this were a full backup. This 90% is a constant now; we might make it a GUC later. D. Otherwise, send a file with .partial added to the name. The .partial file contains an indication of which blocks were changed at the beginning, followed by the data blocks. It also includes a checksum/CRC. Currently, a .partial file format looks like: - start with a 4-byte magic number - then store a 4-byte CRC covering the header - then a 4-byte count of the number of blocks included in the file - then the block numbers, each as a 4-byte quantity - then the data blocks We are also working on combining these incremental back-ups with the full backup and for that, we are planning to add a new utility called pg_combinebackup. Will post the details on that later once we have on the same page for taking backup. Thanks -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation IncrBackup.tar.gz Description: application/gzip
Re: block-level incremental backup
23.04.2019 14:08, Anastasia Lubennikova wrote: I'm volunteering to write a draft patch or, more likely, set of patches, which will allow us to discuss the subject in more detail. And to do that I wish we agree on the API and data format (at least broadly). Looking forward to hearing your thoughts. Though the previous discussion stalled, I still hope that we could agree on basic points such as a map file format and protocol extension, which is necessary to start implementing the feature. - Proof Of Concept patch - In attachments, you can find a prototype of incremental pg_basebackup, which consists of 2 features: 1) To perform incremental backup one should call pg_basebackup with a new argument: pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn' where lsn is a start_lsn of parent backup (can be found in "backup_label" file) It calls BASE_BACKUP replication command with a new argument PREV_BACKUP_START_LSN 'lsn'. For datafiles, only pages with LSN > prev_backup_start_lsn will be included in the backup. They are saved into 'filename.partial' file, 'filename.blockmap' file contains an array of BlockNumbers. For example, if we backuped blocks 1,3,5, filename.partial will contain 3 blocks, and 'filename.blockmap' will contain array {1,3,5}. Non-datafiles use the same format as before. 2) To merge incremental backup into a full backup call pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir' --merge-backups It will move all files from 'incremental_basedir' to 'basedir' handling '.partial' files correctly. - Questions to discuss - Please note that it is just a proof-of-concept patch and it can be optimized in many ways. Let's concentrate on issues that affect the protocol or data format. 1) Whether we collect block maps using simple "read everything page by page" approach or WAL scanning or any other page tracking algorithm, we must choose a map format. I implemented the simplest one, while there are more ideas: - We can have a map not per file, but per relation or maybe per tablespace, which will make implementation more complex, but probably more optimal. The only problem I see with existing implementation is that even if only a few blocks changed, we still must pad it to 512 bytes per tar format requirements. - We can save LSNs into the block map. typedef struct BlockMapItem { BlockNumber blkno; XLogRecPtr lsn; } BlockMapItem; In my implementation, invalid prev_backup_start_lsn means fallback to regular basebackup without any block maps. Alternatively, we can define another meaning of this value and send a block map for all files. Backup utilities can use these maps to speed up backup merge or restore. 2) We can implement BASE_BACKUP SEND_FILELIST replication command, which will return a list of filenames with file sizes and block maps if lsn was provided. To avoid changing format, we can simply send tar headers for each file: - tarHeader("filename.blockmap") followed by blockmap for relation files if prev_backup_start_lsn is provided; - tarHeader("filename") without actual file content for non relation files or for all files in "FULL" backup The caller can parse messages and use them for any purpose, for example, to perform a parallel backup. Thoughts? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 13e0d23..e757bba 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10459,7 +10459,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, ti->oid = pstrdup(de->d_name); ti->path = pstrdup(buflinkpath.data); ti->rpath = relpath ? pstrdup(relpath) : NULL; - ti->size = infotbssize ? sendTablespace(fullpath, true) : -1; + ti->size = infotbssize ? sendTablespace(fullpath, true, InvalidXLogRecPtr) : -1; if (tablespaces) *tablespaces = lappend(*tablespaces, ti); diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index c2978a9..3560da1 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -41,6 +41,7 @@ #include "utils/ps_status.h" #include "utils/relcache.h" #include "utils/timestamp.h" +#include "utils/pg_lsn.h" typedef struct @@ -52,13 +53,22 @@ typedef struct bool includewal; uint32 maxrate; bool sendtblspcmapfile; + XLogRecPtr prev_backup_start_lsn; } basebackup_options; static int64 sendDir(const char *path, int basepathlen, bool sizeonly, - List *tablespaces, bool sendtblspclinks); + List *tablespaces, bool sendtblspclinks, XLogRecPtr prev_backup_start_lsn); static bool sendFile(const char *readfilename, const char *tarfilename, - struct stat *statbuf, bool missing_ok, Oid dboid); + struct stat *statbuf, bool missing_ok, Oid dboid
Re: block-level incremental backup
On Wed, Apr 24, 2019 at 12:57 PM Stephen Frost wrote: > So, I had a thought about that when I was composing the last email and > while I'm still unsure about it, maybe it'd be useful to mention it > here- do we really need a list of every *file*, or could we reduce that > down to a list of relations + forks for the main data directory, and > then always include whatever other directories/files are appropriate? I'm not quite sure what the difference is here. I agree that we could try to compact the list of file names by saying 16384 (24 segments) instead of 16384, 16384.1, ..., 16384.23, but I doubt that saves anything meaningful. I don't see how we can leave anything out altogether. If there's a filename called boaty.mcboatface in the server directory, I think we've got to back it up, and that won't happen unless the client knows that it is there, and it won't know unless we include it in a list. > When it comes to operating in chunks, well, if we're getting a list of > relations instead of files, we do have this thing called cursors.. Sure... but they don't work for replication commands and I am definitely not volunteering to change that. > I would think the client would be able to just ask for the list of > modified files, when it comes to building up the list of files to ask > for, which could potentially be done based on mtime instead of by WAL > scanning or by scanning the files themselves. Don't get me wrong, I'd > prefer that we work based on the WAL, since I have more confidence in > that, but certainly quite a few of the tools do work off mtime these > days and while it's not perfect, the risk/reward there is pretty > palatable to a lot of people. That approach, as with a few others that have been suggested, requires that the client have access to the previous backup, which makes me uninterested in implementing it. I want a version of incremental backup where the client needs to know the LSN of the previous backup and nothing else. That way, if you store your actual backups on a tape drive in an airless vault at the bottom of the Pacific Ocean, you can still take incremental backup against them, as long as you remember to note the LSNs before you ship the backups to the vault. Woohoo! It also allows for the wire protocol to be very simple and the client to be very simple; neither of those things is essential, but both are nice. Also, I think using mtimes is just asking to get burned. Yeah, almost nobody will, but an LSN-based approach is more granular (block level) and more reliable (can't be fooled by resetting a clock backward, or by a filesystem being careless with file metadata), so I think it makes sense to focus on getting that to work. It's worth keeping in mind that there may be somewhat different expectations for an external tool vs. a core feature. Stupid as it may sound, I think people using an external tool are more likely to do things read the directions, and those directions can say things like "use a reasonable filesystem and don't set your clock backward." When stuff goes into core, people assume that they should be able to run it on any filesystem on any hardware where they can get it to work and it should just work. And you also get a lot more users, so even if the percentage of people not reading the directions were to stay constant, the actual number of such people will go up a lot. So picking what we seem to both agree to be the most robust way of detecting changes seems like the way to go from here. > I suspect some of that's driven by how they get solved and if we decide > we have to solve all of them. With things like MAX_RATE + incremental > backups, I wonder how that's going to end up working, when you have the > option to apply the limit to the network, or to the disk I/O. You might > have addressed that elsewhere, I've not looked, and I'm not too > particular about it personally either, but a definition could be "max > rate at which we'll read the file you asked for on this connection" and > that would be pretty straight-forward, I'd think. I mean, it's just so people can tell pg_basebackup what rate they want via a command-line option and have it happen like that. They don't care about the rates for individual files. > > Issue #1: If you manually add files to your backup, remove files from > > your backup, or change files in your backup, bad things will happen. > > There is fundamentally nothing we can do to prevent this completely, > > but it may be possible to make the system more resilient against > > ham-handed modifications, at least to the extent of detecting them. > > That's maybe a topic for another thread, but it's an interesting one: > > Andres and I were brainstorming about it at some point. > > I'd certainly be interested in hearing about ways we can improve on > that. I'm alright with it being on another thread as it's a broader > concern than just what we're talking about here. Might be a good topic to chat about at PGCon. > >
Re: block-level incremental backup
e thing once you > add incrementals to the picture you might easily remove a full backup > upon which a newer incremental depends. I see the need for good tools > to manage this kind of complexity, but have no plan as part of this > project to provide them. I think that just requires too many > assumptions about where those backups are being stored and how they > are being catalogued and managed; I don't believe I currently am > knowledgeable enough to design something that would be good enough to > meet core standards for inclusion, and I don't want to waste energy > trying. If someone else wants to try, that's OK with me, but I think > it's probably better to let this be a thing that people experiment > with outside of core for a while until we see what ends up being a > winner. I realize that this is a debatable position, but as I'm sure > you realize by now, I have a strong desire to limit the scope of this > project in such a way that I can get it done, 'cuz a bird in the hand > is worth two in the bush. Even if what we're talking about here is really only "differentials", or backups where the incremental contains all the changes from a prior full backup, if the only check is "full LSN is greater than or equal to the incremental backup LSN", then you have a potential problem that's larger than just the incrementals no longer being valid because you removed the full backup on which they were taken- you might think that an *earlier* full backup is the one for a given incremental and perform a restore with the wrong full/incremental matchup and end up with a corrupted database. These are exactly the kind of issues that make me really wonder if this is the right natural progression for pg_basebackup or any backup tool to go in. Maybe there's some additional things we can do to make it harder for someone to end up with a corrupted database when they restore, but it's really hard to get things like expiration correct. We see users already ending up with problems because they don't manage expiration of their WAL correctly, and now we're adding another level of serious complication to the expiration requirements that, as we've seen even on this thread, some users are just not going to ever feel comfortable with doing on their own. Perhaps it's not relevant and I get that you want to build this cool incremental backup capability into pg_basebackup and I'm not going to stop you from doing it, but if I was going to build a backup tool, adding support for block-level incremental backup wouldn't be where I'd start, and, in fact, I might not even get to it even after investing over 5 years in the project and even after building in proper backup management. The idea of implementing block-level incrementals while pushing the backup management, expiration, and dependency between incrementals and fulls on to the user to figure out just strikes me as entirely backwards and, frankly, to be gratuitously 'itch scratching' at the expense of what users really want and need here. One of the great things about pg_basebackup is its simplicity and ability to be a one-time "give me a snapshot of the database" and this is building in a complicated feature to it that *requires* users to build their own basic capabilities externally in order to be able to use it. I've tried to avoid getting into that here and I won't go on about it, since it's your time to do with as you feel appropriate, but I do worry that it makes us, as a project, look a bit more cavalier about what users are asking for vs. what cool new thing we want to play with than I, at least, would like us to be (so, I'll caveat that with "in this area anyway", since I suspect saying this will probably come back to bite me in some other discussion later ;). Thanks, Stephen signature.asc Description: PGP signature
Re: block-level incremental backup
On Wed, Apr 24, 2019 at 9:28 AM Stephen Frost wrote: > Looking at it from what I'm sitting, I brought up two ways that we > could extend the protocol to "request from the server only those blocks > where LSN >= threshold_value" with one being the modification to > BASE_BACKUP and the other being a new set of commands that could be > parallelized. If I had assumed that you'd be thinking the same way I am > about extending the backup protocol, I wouldn't have said anything now > and then would have complained after you wrote a patch that just > extended the BASE_BACKUP command, at which point I likely would have > been told that it's now been done and that I should have mentioned it > earlier. Fair enough. > At least in part then it seems like we're viewing the level of effort > around what I'm talking about quite differently, and I feel like that's > largely because every time I mention parallel anything there's this > assumption that I'm asking you to parallelize pg_basebackup or write a > whole bunch more code to provide a fully optimized server-side parallel > implementation for backups. That really wasn't what I was going for. I > was thinking it would be a modest amount of additional work add > incremental backup via a few new commands, instead of through the > BASE_BACKUP protocol command, that would make parallelization possible. I'm not sure about that. It doesn't seem crazy difficult, but there are a few wrinkles. One is that if the client is requesting files one at a time, it's got to have a list of all the files that it needs to request, and that means that it has to ask the server to make a preparatory pass over the whole PGDATA directory to get a list of all the files that exist. That overhead is not otherwise needed. Another is that the list of files might be really large, and that means that the client would either use a lot of memory to hold that great big list, or need to deal with spilling the list to a spool file someplace, or else have a server protocol that lets the list be fetched in incrementally in chunks. A third is that, as you mention further on, it means that the client has to care a lot more about exactly how the server is figuring out which blocks have been modified. If it just says BASE_BACKUP ..., the server an be internally reading each block and checking the LSN, or using WAL-scanning or ptrack or whatever and the client doesn't need to know or care. But if the client is asking for a list of modified files or blocks, then that presumes the information is available, and not too expensively, without actually reading the files. Fourth, MAX_RATE probably won't actually limit to the correct rate overall if the limit is applied separately to each file. I'd be afraid that a patch that tried to handle all that as part of this project would get rejected on the grounds that it was trying to solve too many unrelated problems. Also, though not everybody has to agree on what constitutes a "modest amount of additional work," I would not describe solving all of those problems as a modest effort, but rather a pretty substantial one. > There's a tangent on all of this that's pretty key though, which is the > question around just how the blocks are identified. If the WAL scanning > is done to figure out the blocks, then that's quite a bit different from > the other idea of "open this relation and scan it, but only give me the > blocks after this LSN". It's the latter case that I've been mostly > thinking about in this thread, which is part of why I was thinking it'd > be a modest amount of work to have protocol commands that accepted a > file (or perhaps a relation..) to scan and return blocks from instead of > baking this into BASE_BACKUP which by definition just serially scans the > data directory and returns things as it finds them. For the case where > we have WAL scanning happening and modfiles which are being read and > used to figure out the blocks to send, it seems like it might be more > complicated and therefore potentially quite a bit more work to have a > parallel version of that. Yeah. I don't entirely agree that the first one is simple, as per the above, but I definitely agree that the second one is more complicated than the first one. > > Well, one thing you might want to do is have a tool that connects to > > the server, enters backup mode, requests information on what blocks > > have changed, copies those blocks via direct filesystem access, and > > then exits backup mode. Such a tool would really benefit from a > > START_BACKUP / SEND_FILE_LIST / SEND_FILE_CONTENTS / STOP_BACKUP > > command language, because it would just skip ever issuing the > > SEND_FILE_CONTENTS command in favor of doing that part of the work via > > other means. On the other hand, a START_PARALLEL_BACKUP LSN '1/234' > > command is useless to such a tool. > > That's true, but I hardly ever hear people talking about how wonderful > it is that pgBackRest uses SSH to grab the data
Re: block-level incremental backup
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Apr 22, 2019 at 2:26 PM Stephen Frost wrote: > > There was basically zero discussion about what things would look like at > > a protocol level (I went back and skimmed over the thread before sending > > my last email to specifically see if I was going to get this response > > back..). I get the idea behind the diff file, the contents of which I > > wasn't getting into above. > > Well, I wrote: > > "There should be a way to tell pg_basebackup to request from the > server only those blocks where LSN >= threshold_value." > > I guess I assumed that people would interested in the details take > that to mean "and therefore the protocol would grow an option for this > type of request in whatever way is the most straightforward possible > extension of the current functionality is," which is indeed how you > eventually interpreted it when you said we could "extend BASE_BACKUP > is by adding LSN as an optional parameter." Looking at it from what I'm sitting, I brought up two ways that we could extend the protocol to "request from the server only those blocks where LSN >= threshold_value" with one being the modification to BASE_BACKUP and the other being a new set of commands that could be parallelized. If I had assumed that you'd be thinking the same way I am about extending the backup protocol, I wouldn't have said anything now and then would have complained after you wrote a patch that just extended the BASE_BACKUP command, at which point I likely would have been told that it's now been done and that I should have mentioned it earlier. > > external tools to leverage that. It sounds like what you're suggesting > > now is that you're happy to implement the backend code, expose it in a > > way that works just for pg_basebackup, and that if someone else wants to > > add things to the protocol to make it easier for external tools to > > leverage, great. > > Yep, that's more or less it, although I am potentially willing to do > some modest amount of that other work along the way. I just don't > want to prioritize it higher than getting the actual thing I want to > build built, which I think is a pretty fair position for me to take. At least in part then it seems like we're viewing the level of effort around what I'm talking about quite differently, and I feel like that's largely because every time I mention parallel anything there's this assumption that I'm asking you to parallelize pg_basebackup or write a whole bunch more code to provide a fully optimized server-side parallel implementation for backups. That really wasn't what I was going for. I was thinking it would be a modest amount of additional work add incremental backup via a few new commands, instead of through the BASE_BACKUP protocol command, that would make parallelization possible. Now, through this discussion, you've brought up some really good points about how the initial thoughts I had around how we could add some relatively simple commands, as part of this work, to make it easier for someone to later add parallel support to pg_basebackup (either full or incremental), or for external tools to leverage, might not be the best solution when it comes to having parallel backup in core, and therefore wouldn't actually end up being useful towards that end. That's certainly a fair point and possibly enough to justify not spending even the modest time I was thinking it'd need, but I'm not convinced. Now, that said, if you are convinced that's the case, and you're doing the work, then it's certainly your prerogative to go in the direction you're convinced of. I don't mean any of this discussion to imply that I'd object to a commit that extended BASE_BACKUP in the way outlined above, but I understood the question to be "what do people think of this idea?" and to that I'm still of the opinion that spending a modest amount of time to provide a way to parallelize an incremental backup is worth it, even if it isn't optimal and isn't the direct goal of this effort. There's a tangent on all of this that's pretty key though, which is the question around just how the blocks are identified. If the WAL scanning is done to figure out the blocks, then that's quite a bit different from the other idea of "open this relation and scan it, but only give me the blocks after this LSN". It's the latter case that I've been mostly thinking about in this thread, which is part of why I was thinking it'd be a modest amount of work to have protocol commands that accepted a file (or perhaps a relation..) to scan and return blocks from instead of baking this into BASE_BACKUP which by definition just serially scans the data directory and returns things as it finds them. For the case where we have WAL scanning happening and modfiles which are being read and used to figure out the blocks to send, it seems like it might be more complicated and therefore potentially quite a bit more work to have a parallel ve
Re: block-level incremental backup
I hope it's alright to throw in my $0.02 as a user. I've been following this (and the other thread on reading WAL to find modified blocks, prefaulting, whatever else) since the start with great excitement and would love to see the built-in backup capabilities in Postgres greatly improved. I know this is not completely on-topic for just incremental backups, so I apologize in advance. It just seemed like the most apt place to chime in. Just to preface where I am coming from, I have been using pgBackRest for the past couple years and used wal-e prior to that. I am not a big *nix user other than all my servers, do all my development on Windows / use primarily Java. The command line is not where I feel most comfortable despite my best efforts over the last 5-6 years. Prior to Postgres, I used SQL Server for quite a few years at previous companies but was more a junior / intermediate skill set back then. I just wanted to put that out there so you can see where my bias's are. With all that said, I would not be comfortable using pg_basebackup as my main backup tool simply because I’d have to cobble together numerous tools to get backups stored in a safe (not on the same server) location, I’d have to manage expiring backups and the WAL which is no longer needed, along with the rest of the stuff that makes these backup management tools useful. The command line scares me, and even if I was able to get all that working, I would not feel warm and fuzzy I didn’t mess something up horribly and I may hit an edge case which destroys backups, silently corrupts data, etc. I love that there are tools that manage all of it; backups, wal archiving, remote storage, integrate with cloud storage (S3 and the like), manages the retention of these backups with all their dependencies for me, and has all the restore options necessary built in as well. Block level incremental backup would be amazing for my use case. I have small updates / deletes that happen to data all over some of my largest tables. With pgBackRest, since the diff/incremental backups are at the file level, I can have a single update / delete which touched a random spot in a table and now requires that whole 1gb file to be backed up again. That said, even if pg_basebackup was the only tool that did incremental block level backup tomorrow, I still wouldn’t start using it directly. I went into the issues I’d have to deal with if I used pg_basebackup above, and incremental backups without a management tool make me think using it correctly would be much harder. I know this thread is just about incremental backup, and that pretty much everything in core is built up from small features into larger more complex ones. I understand that and am not trying to dump on any efforts, I am super excited to see work being done in this area! I just wanted to share my perspective on how crucial good backup management is to me (and I’m sure a few others may share my sentiment considering how popular all the external tools are). I would never put a system in production unless I have some backup management in place. If core builds a backup management tool which uses pg_basebackup as building blocks for its solution…awesome! That may be something I’d use. If pg_basebackup can be improved so it can be used as the basis most external backup management tools can build on top of, that’s also great. All the external tools which practically every Postgres company have built show that it’s obviously a need for a lot of users. Core will never solve every single problem for all users, I know that. It would just be great to see some of the fundamental features of backup management baked into core in an extensible way. With that, there could be a recommended way to set up backups (full/incremental, parallel, compressed), point in time recovery, backup retention, and perform restores (to a point in time, on a replica server, etc) with just the tooling within core with a nice and simple user interface, and great performance. If those features core supports in the internal tooling are built in an extensible way (as has been discussed), there could be much less duplication of work implementing the same base features over and over for each external tool. Those companies can focus on more value-added features to their own products that core would never support, or on improving the tooling/performance/features core provides. Well, this is way longer and a lot less coherent than I was hoping, so I apologize for that. Hopefully my stream of thoughts made a little bit of sense to someone. -Adam
Re: block-level incremental backup
22.04.2019 2:02, Robert Haas wrote: I think we're getting closer to a meeting of the minds here, but I don't think it's intrinsically necessary to rewrite the whole method of operation of pg_basebackup to implement incremental backup in a sensible way. One could instead just do a straightforward extension to the existing BASE_BACKUP command to enable incremental backup. Then, to enable parallel full backup and all sorts of out-of-core hacking, one could expand the command language to allow tools to access individual steps: START_BACKUP, SEND_FILE_LIST, SEND_FILE_CONTENTS, STOP_BACKUP, or whatever. The second thing makes for an appealing project, but I do not think there is a technical reason why it has to be done first. Or for that matter why it has to be done second. As I keep saying, incremental backup and full backup are separate projects and I believe it's completely reasonable for whoever is doing the work to decide on the order in which they would like to do the work. Having said that, I'm curious what people other than Stephen (and other pgbackrest hackers) think about the relative value of parallel backup vs. incremental backup. Stephen appears quite convinced that parallel backup is full of win and incremental backup is a bit of a yawn by comparison, and while I certainly would not want to discount the value of his experience in this area, it sometimes happens on this mailing list that [ drum roll please ] not everybody agrees about everything. So, what do other people think? Personally, I believe that incremental backups are more useful to implement first since they benefit both backup speed and the space taken by a backup. Frankly speaking, I'm a bit surprised that the discussion of parallel backups took so much of this thread. Of course, we must keep it in mind, while designing the API to avoid introducing any architectural obstacles, but any further discussion of parallelism is a subject of another topic. I understand Stephen's concerns about the difficulties of incremental backup management. Even with an assumption that user is ready to manage backup chains, retention, and other stuff, we must consider the format of backup metadata that will allow us to perform some primitive commands: 1) Tell whether this backup full or incremental. 2) Tell what backup is a parent of this incremental backup. Probably, we can limit it to just returning "start_lsn", which later can be compared to "stop_lsn" of parent backup. 3) Take an incremental backup based on this backup. Here we must help a backup manager to retrieve the LSN to pass it to pg_basebackup. 4) Restore an incremental backup into a directory (on top of already restored full backup). One may use it to perform "merge" or "restore" of the incremental backup, depending on the destination directory. I wonder if it is possible to integrate it into any existing tool, or we end up with something like pg_basebackup/pg_baserestore as in case of pg_dump/pg_restore. Have you designed these? I may only recall "pg_combinebackup" from the very first message in this thread, which looks more like a sketch to explain the idea, rather than the thought-out feature design. I also found a page https://wiki.postgresql.org/wiki/Incremental_backup that raises the same questions. I'm volunteering to write a draft patch or, more likely, set of patches, which will allow us to discuss the subject in more detail. And to do that I wish we agree on the API and data format (at least broadly). Looking forward to hearing your thoughts. As I see it, ideally the backup management tools should concentrate more on managing multiple backups, while all the logic of taking a single backup (of any kind) should be integrated into the core. It means that any out-of-core client won't have to walk the PGDATA directory and care about all the postgres specific knowledge of data files consisting of blocks with headers and LSNs and so on. It simply requests data and gets it. Understandably, it won't be implemented in one take and what is more probably, it is not reachable fully. Still, it will be great to do our best to provide such tools (both existing and future) with conveniently formatted data and API to get it. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: block-level incremental backup
On Mon, Apr 22, 2019 at 2:26 PM Stephen Frost wrote: > There was basically zero discussion about what things would look like at > a protocol level (I went back and skimmed over the thread before sending > my last email to specifically see if I was going to get this response > back..). I get the idea behind the diff file, the contents of which I > wasn't getting into above. Well, I wrote: "There should be a way to tell pg_basebackup to request from the server only those blocks where LSN >= threshold_value." I guess I assumed that people would interested in the details take that to mean "and therefore the protocol would grow an option for this type of request in whatever way is the most straightforward possible extension of the current functionality is," which is indeed how you eventually interpreted it when you said we could "extend BASE_BACKUP is by adding LSN as an optional parameter." I could have been more explicit, but sometimes people tell me that my emails are too long. > external tools to leverage that. It sounds like what you're suggesting > now is that you're happy to implement the backend code, expose it in a > way that works just for pg_basebackup, and that if someone else wants to > add things to the protocol to make it easier for external tools to > leverage, great. Yep, that's more or less it, although I am potentially willing to do some modest amount of that other work along the way. I just don't want to prioritize it higher than getting the actual thing I want to build built, which I think is a pretty fair position for me to take. > All I can say is that that's basically how we ended up > in the situation we're in today where pg_basebackup doesn't support > parallel backup but a bunch of external tools do and they don't go > through the backend to get there, even though they'd probably prefer to. I certainly agree that core should try to do things in a way that is useful to external tools when that can be done without undue effort, but only if it can actually be done without undo effort. Let's see whether that's the case here: - Anastasia wants a command added that dumps out whatever the server knows about what files have changed, which I already agreed was a reasonable extension of my initial proposal. - You said that for this to be useful to pgbackrest, it'd have to use a whole different mechanism that includes commands to request individual files and blocks within those files, which would be a significant rewrite of pg_basebackup that you agreed is more closely related to parallel backup than to the project under discussion on this thread. And that even then pgbackrest probably wouldn't use it because it also does server-side compression and encryption which are not included in this proposal. It seems to me that the first one falls into the category a reasonable additional effort and the second one falls into the category of lots of extra and unrelated work that wouldn't even get used. > Thanks for sharing your thoughts on that, certainly having the backend > able to be more intelligent about streaming files to avoid latency is > good and possibly the best approach. Another alternative to reducing > the latency would be to have a way for the client to request a set of > files, but I don't know that it'd be better. I don't know either. This is an area that needs more thought, I think, although as discussed, it's more related to parallel backup than $SUBJECT. > I'm not really sure why the above is extremely inconvenient for > third-party tools, beyond just that they've already been written to work > with an assumption that the server-side of things isn't as intelligent > as PG is. Well, one thing you might want to do is have a tool that connects to the server, enters backup mode, requests information on what blocks have changed, copies those blocks via direct filesystem access, and then exits backup mode. Such a tool would really benefit from a START_BACKUP / SEND_FILE_LIST / SEND_FILE_CONTENTS / STOP_BACKUP command language, because it would just skip ever issuing the SEND_FILE_CONTENTS command in favor of doing that part of the work via other means. On the other hand, a START_PARALLEL_BACKUP LSN '1/234' command is useless to such a tool. Contrariwise, a tool that has its own magic - perhaps based on WAL-scanning or something like ptrack - to know which files currently exist and which blocks are modified could use SEND_FILE_CONTENTS but not SEND_FILE_LIST. And a filesystem-snapshot based technique might use START_BACKUP and STOP_BACKUP but nothing else. In short, providing granular commands like this lets the client be really intelligent even if the server isn't, and lets the client have fine-grained control of the process. This is very good if you're an out-of-core tool maintainer and your tool is trying to be smarter than - or even just differently-designed than - core. But if what you really want is just a maximally-efficient parallel backup, you don't nee
Re: block-level incremental backup
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2019-04-22 14:26:40 -0400, Stephen Frost wrote: > > I'm disappointed that the concerns about the trouble that end users are > > likely to have with this didn't garner more discussion. > > My impression is that endusers are having a lot more trouble due to > important backup/restore features not being in core/pg_basebackup, than > due to external tools having a harder time to implement certain > features. I had been referring specifically to the concern I raised about incremental block-level backups being added to pg_basebackup and how that'll make using pg_basebackup more complicated and therefore more difficult for end-users to get right, particularly if the end user is having to handle management of the association between the full backup and the incremental backups. I wasn't referring to anything regarding external tools. > Focusing on external tools being able to provide all those > features, because core hasn't yet, is imo entirely the wrong thing to > concentrate upon. And it's not like things largely haven't been > implemented in pg_basebackup for fundamental architectural reasons. > It's because we've built like 5 different external tools with randomly > differing featureset and licenses. There's a few challenges when it comes to adding backup features to core. One of the reasons is that core naturally moves slower when it comes to development than external projects do, as was discusssed earlier on this thread. Another is that, when it comes to backup, specifically, people want to back up their *existing* systems, which means that they need a backup tool that's going to work with whatever version of PG they've currently got deployed and that's often a few years old already. Certainly when I've thought about features that we'd like to see and considered if there's something that could be implemented in core vs. implemented outside of core, the answer often ends up being "well, if we do it ourselves then we can make it work for PG 9.2 and above, and have it working for existing users, but if we work it in as part of core, it won't be available until next year and only for version 12 and above, and users can only use it once they've upgraded.." Thanks, Stephen signature.asc Description: PGP signature
Re: block-level incremental backup
Hi, On 2019-04-22 14:26:40 -0400, Stephen Frost wrote: > I'm disappointed that the concerns about the trouble that end users are > likely to have with this didn't garner more discussion. My impression is that endusers are having a lot more trouble due to important backup/restore features not being in core/pg_basebackup, than due to external tools having a harder time to implement certain features. Focusing on external tools being able to provide all those features, because core hasn't yet, is imo entirely the wrong thing to concentrate upon. And it's not like things largely haven't been implemented in pg_basebackup for fundamental architectural reasons. It's because we've built like 5 different external tools with randomly differing featureset and licenses. Greetings, Andres Freund
Re: block-level incremental backup
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Apr 22, 2019 at 1:08 PM Stephen Frost wrote: > > > One could instead just do a straightforward extension > > > to the existing BASE_BACKUP command to enable incremental backup. > > > > Ok, how do you envision that? As I mentioned up-thread, I am concerned > > that we're talking too high-level here and it's making the discussion > > more difficult than it would be if we were to put together specific > > ideas and then discuss them. > > > > One way I can imagine to extend BASE_BACKUP is by adding LSN as an > > optional parameter and then having the database server scan the entire > > cluster and send a tarball which contains essentially a 'diff' file of > > some kind for each file where we can construct a diff based on the LSN, > > and then the complete contents of the file for everything else that > > needs to be in the backup. > > /me scratches head. Isn't that pretty much what I described in my > original post? I even described what that "'diff' file of some kind" > would look like in some detail in the paragraph of that emailed > numbered "2.", and I described the reasons for that choice at length > in > http://postgr.es/m/ca+tgmozrqdv-tb8ny9p+1pqlqkxp5f1afghuohh5qt6ewdk...@mail.gmail.com > > I can't figure out how I'm managing to be so unclear about things > about which I thought I'd been rather explicit. There was basically zero discussion about what things would look like at a protocol level (I went back and skimmed over the thread before sending my last email to specifically see if I was going to get this response back..). I get the idea behind the diff file, the contents of which I wasn't getting into above. > > So, sure, that would work, but it wouldn't be able to be parallelized > > and I don't think it'd end up being very exciting for the external tools > > because of that, but it would be fine for pg_basebackup. > > Stop being such a pessimist. Yes, if we only add the option to the > BASE_BACKUP command, it won't directly be very exciting for external > tools, but a lot of the work that is needed to do things that ARE > exciting for external tools will have been done. For instance, if the > work to figure out which blocks have been modified via WAL-scanning > gets done, and initially that's only exposed via BASE_BACKUP, it won't > be much work for somebody to write code for a new code that exposes > that information directly through some new replication command. > There's a difference between something that's going in the wrong > direction and something that's going in the right direction but not as > far or as fast as you'd like. And I'm 99% sure that everything I'm > proposing here falls in the latter category rather than the former. I didn't mean to imply that you're doing in the wrong direction here and I thought I said somewhere in my last email more-or-less exactly the same, that a great deal of the work needed for block-level incremental backup would be done, but specifically that this proposal wouldn't allow external tools to leverage that. It sounds like what you're suggesting now is that you're happy to implement the backend code, expose it in a way that works just for pg_basebackup, and that if someone else wants to add things to the protocol to make it easier for external tools to leverage, great. All I can say is that that's basically how we ended up in the situation we're in today where pg_basebackup doesn't support parallel backup but a bunch of external tools do and they don't go through the backend to get there, even though they'd probably prefer to. > > On the other hand, if you added new commands for 'list of files changed > > since this LSN' and 'give me this file' and 'give me this file with the > > changes in it since this LSN', then pg_basebackup could work with that > > pretty easily in a single-threaded model (maybe with two connections to > > the backend, but still in a single process, or maybe just by slurping up > > the file list and then asking for each one) and the external tools could > > leverage those new capabilities too for their backups, both full backups > > and incremental ones. This also wouldn't have to change how > > pg_basebackup does full backups today one bit, so what we're really > > talking about here is the direction to take the new code that's being > > written, not about rewriting existing code. I agree that it'd be a bit > > more work... but hopefully not *that* much more, and i
Re: block-level incremental backup
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2019-04-19 20:04:41 -0400, Stephen Frost wrote: > > I agree that we don't want another implementation and that there's a lot > > that we want to do to improve replay performance. We've already got > > frontend tools which work with multiple execution threads, so I'm not > > sure I get the "not easily feasible" bit, and the argument about the > > checkpointer seems largely related to that (as in- if we didn't have > > multiple threads/processes then things would perform quite badly... but > > we can and do have multiple threads/processes in frontend tools today, > > even in pg_basebackup). > > You need not just multiple execution threads, but basically a new > implementation of shared buffers, locking, process monitoring, with most > of the related infrastructure. You're literally talking about > reimplementing a very substantial portion of the backend. I'm not sure > I can transport in written words - via a public medium - how bad an idea > it would be to go there. Yes, there'd be some need for locking and process monitoring, though if we aren't supporting ongoing read queries at the same time, there's a whole bunch of things that we don't need from the existing backend. > > > Which I think is entirely reasonable. With the 'consistent' and LSN > > > recovery targets one already can get most of what's needed from such a > > > tool, anyway. I'd argue the biggest issue there is that there's no > > > equivalent to starting postgres with a private socket directory on > > > windows, and perhaps an option or two making it easier to start postgres > > > in a "private" mode for things like this. > > > > This would mean building in a way to do parallel WAL replay into the > > server binary though, as discussed above, and it seems like making that > > work in a way that allows us to still be available as a read-only > > standby would be quite a bit more difficult. We could possibly support > > parallel WAL replay only when we aren't a replica but from the same > > binary. > > I'm doubtful that we should try to implement parallel WAL apply that > can't support HS - a substantial portion of the the logic to avoid > issues around relfilenode reuse, consistency etc is going to be to be > necessary for non-HS aware apply anyway. But if somebody had a concrete > proposal for something that's fundamentally only doable without HS, I > could be convinced. I'd certainly prefer that we support parallel WAL replay *with* HS, that just seems like a much larger problem, but I'd be quite happy to be told that it wouldn't be that much harder. > > A lot of this part of the discussion feels like a tangent though, unless > > I'm missing something. > > I'm replying to: > > On 2019-04-17 18:43:10 -0400, Stephen Frost wrote: > > Wow. I have to admit that I feel completely opposite of that- I'd > > *love* to have an independent tool (which ideally uses the same code > > through the common library, or similar) that can be run to apply WAL. > > And I'm basically saying that anything that starts from this premise is > fatally flawed (in the ex falso quodlibet kind of sense ;)). I'd just say that it'd be... difficult. :) > > The "WAL compression" tool contemplated > > previously would be much simpler and not the full-blown WAL replay > > capability, which would be left to the server, unless you're suggesting > > that even that should be exclusively the purview of the backend? Though > > that ship's already sailed, given that external projects have > > implemented it. > > I'm extremely doubtful of such tools (but it's not what I was responding > too, see above). I'd be extremely surprised if even one of them came > close to being correct. The old FPI removal tool had data corrupting > bugs left and right. I have concerns about it myself, which is why I'd actually really like to see something in core that does it, and does it the right way, that other projects could then leverage (ideally by just linking into the library without having to rewrite what's in core, though that might not be an option for things like WAL-G that are in Go and possibly don't want to link in some C library). > > Having a library to provide that which external > > projects could leverage would be nicer than having everyone write their > > own version. > > No, I don't think that's necessarily true. Something complicated that's > hard to get right doesn't have to be provided by core. Even if other > projects decide that their risk/reward assesment is different than core > postgres'. We don't have to take on all kind of work and complexity for > external tools. No, it doesn't have to be provided by core, but I sure would like it to be and I'd be much more comfortable if it was because then we'd also take care to not break whatever assumptions are made (or to do so in a way that can be detected and/or handled) as new code is written. As discussed above, as long as it isn't provided by core
Re: block-level incremental backup
BACKUP/SEND_FILE_LIST/SEND_FILE_CONTENTS/STOP_BACKUP type interface for use by external tools. On the other hand, maybe the additional overhead caused by managing the list of files to be fetched on the client side is negligible. It'd be interesting to see, though, how busy the server is when running an incremental backup managed by an external tool like BART or pgbackrest on a cluster with a gazillion little-tiny relations. I wonder if we'd find that it spends most of its time waiting for the client. > What I'm afraid will be lackluster is adding block-level incremental > backup support to pg_basebackup without any support for managing > backups or anything else. I'm also concerned that it's going to mean > that people who want to use incremental backup with pg_basebackup are > going to have to write a lot of their own management code (probably in > shell scripts and such...) around that and if they get anything wrong > there then people are going to end up with bad backups that they can't > restore from, or they'll have corrupted clusters if they do manage to > get them restored. I think that this is another complaint that basically falls into the category of saying that this proposal might not fix everything for everybody, but that complaint could be levied against any reasonable development proposal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
Hi, On 2019-04-19 20:04:41 -0400, Stephen Frost wrote: > I agree that we don't want another implementation and that there's a lot > that we want to do to improve replay performance. We've already got > frontend tools which work with multiple execution threads, so I'm not > sure I get the "not easily feasible" bit, and the argument about the > checkpointer seems largely related to that (as in- if we didn't have > multiple threads/processes then things would perform quite badly... but > we can and do have multiple threads/processes in frontend tools today, > even in pg_basebackup). You need not just multiple execution threads, but basically a new implementation of shared buffers, locking, process monitoring, with most of the related infrastructure. You're literally talking about reimplementing a very substantial portion of the backend. I'm not sure I can transport in written words - via a public medium - how bad an idea it would be to go there. > You certainly bring up some good concerns though and they make me think > of other bits that would seem like they'd possibly be larger issues for > a frontend tool- like having a large pool of memory for cacheing (aka > shared buffers) the changes. If what we're talking about here is *just* > replay though, without having the system available for reads, I wonder > if we might want a different solution there. No. > > Which I think is entirely reasonable. With the 'consistent' and LSN > > recovery targets one already can get most of what's needed from such a > > tool, anyway. I'd argue the biggest issue there is that there's no > > equivalent to starting postgres with a private socket directory on > > windows, and perhaps an option or two making it easier to start postgres > > in a "private" mode for things like this. > > This would mean building in a way to do parallel WAL replay into the > server binary though, as discussed above, and it seems like making that > work in a way that allows us to still be available as a read-only > standby would be quite a bit more difficult. We could possibly support > parallel WAL replay only when we aren't a replica but from the same > binary. I'm doubtful that we should try to implement parallel WAL apply that can't support HS - a substantial portion of the the logic to avoid issues around relfilenode reuse, consistency etc is going to be to be necessary for non-HS aware apply anyway. But if somebody had a concrete proposal for something that's fundamentally only doable without HS, I could be convinced. > The concerns mentioned about making it easier to start PG in a > private mode don't seem too bad but I am not entirely sure that the > tools which want to leverage that kind of capability would want to have > to exec out to the PG binary to use it. Tough luck. But even leaving infeasability aside, it seems like a quite bad idea to do this in-process inside a tool that manages backup & recovery. Creating threads / sub-processes with complicated needs (like any pared down version of pg to do just recovery would have) from within a library has substantial complications. So you'd not want to do this in-process anyway. > A lot of this part of the discussion feels like a tangent though, unless > I'm missing something. I'm replying to: On 2019-04-17 18:43:10 -0400, Stephen Frost wrote: > Wow. I have to admit that I feel completely opposite of that- I'd > *love* to have an independent tool (which ideally uses the same code > through the common library, or similar) that can be run to apply WAL. And I'm basically saying that anything that starts from this premise is fatally flawed (in the ex falso quodlibet kind of sense ;)). > The "WAL compression" tool contemplated > previously would be much simpler and not the full-blown WAL replay > capability, which would be left to the server, unless you're suggesting > that even that should be exclusively the purview of the backend? Though > that ship's already sailed, given that external projects have > implemented it. I'm extremely doubtful of such tools (but it's not what I was responding too, see above). I'd be extremely surprised if even one of them came close to being correct. The old FPI removal tool had data corrupting bugs left and right. > Having a library to provide that which external > projects could leverage would be nicer than having everyone write their > own version. No, I don't think that's necessarily true. Something complicated that's hard to get right doesn't have to be provided by core. Even if other projects decide that their risk/reward assesment is different than core postgres'. We don't have to take on all kind of work and complexity for external tools. Greetings, Andres Freund
Re: block-level incremental backup
es my thoughts here. I'd also like to hear what others think, particularly those who have been working in this area. > think about the relative value of parallel > backup vs. incremental backup. Stephen appears quite convinced that > parallel backup is full of win and incremental backup is a bit of a > yawn by comparison, and while I certainly would not want to discount > the value of his experience in this area, it sometimes happens on this > mailing list that [ drum roll please ] not everybody agrees about > everything. So, what do other people think? I'm afraid this is painting my position here with an extremely broad brush and so I'd like to clarify a bit: I'm *all* for incremental backups. Incremental and differential backups were supported by pgBackRest very early on and are used extensively. Today's pgBackRest does that at a file level, but I would very much like to get to a block level shortly after we finish rewriting it into C and porting it to Windows (and probably the other platforms PG runs on today), which isn't very far off now. I'd like to make sure that whatever core ends up with as an incremental backup solution also matches very closely what we do with pgBackRest too, but everything that's been discussed here seems pretty reasonable when it comes to the bits around how the blocks are detected and the files get stitched back together, so I don't expect there to be too much of an issue there. What I'm afraid will be lackluster is adding block-level incremental backup support to pg_basebackup without any support for managing backups or anything else. I'm also concerned that it's going to mean that people who want to use incremental backup with pg_basebackup are going to have to write a lot of their own management code (probably in shell scripts and such...) around that and if they get anything wrong there then people are going to end up with bad backups that they can't restore from, or they'll have corrupted clusters if they do manage to get them restored. It'd also be nice to have as much exposed through the common library as possible when it comes to, well, everything being discussed, so that the external tools could leverage that code and avoid having to write their own. This would probably apply more to the WAL-scanning discussion, but figured I'd mention it here too. If the protocol was implemented in a way that we could leverage it from external tools in a parallel fashion then I'd be more excited about the overall body of work, although, thinking about it a bit more, I have to admit that I'm not sure that pgBackRest would end up using it in any case, no matter how it's implemented, since it wouldn't support compression or encryption, both of which we support doing in-stream before the data leaves the server, though the external tools which don't support those options likely would find the parallel option more appealing. Thanks, Stephen signature.asc Description: PGP signature
Re: block-level incremental backup
On 22.04.2019 2:02, Robert Haas wrote: On Sat, Apr 20, 2019 at 4:32 PM Stephen Frost wrote: Having been around for a while working on backup-related things, if I was to implement the protocol for pg_basebackup today, I'd definitely implement "give me a list" and "give me this file" rather than the tar-based approach, because I've learned that people want to be able to do parallel backups and that's a decent way to do that. I wouldn't set out and implement something new that's there's just no hope of making parallel. Maybe the first write of pg_basebackup would still be simple and serial since it's certainly more work to make a frontend tool like that work in parallel, but at least the protocol would be ready to support a parallel option being added alter without being rewritten. And that's really what I was trying to get at here- if we've got the choice now to decide what this is going to look like from a protocol level, it'd be great if we could make it able to support being used in a parallel fashion, even if pg_basebackup is still single-threaded. I think we're getting closer to a meeting of the minds here, but I don't think it's intrinsically necessary to rewrite the whole method of operation of pg_basebackup to implement incremental backup in a sensible way. One could instead just do a straightforward extension to the existing BASE_BACKUP command to enable incremental backup. Then, to enable parallel full backup and all sorts of out-of-core hacking, one could expand the command language to allow tools to access individual steps: START_BACKUP, SEND_FILE_LIST, SEND_FILE_CONTENTS, STOP_BACKUP, or whatever. The second thing makes for an appealing project, but I do not think there is a technical reason why it has to be done first. Or for that matter why it has to be done second. As I keep saying, incremental backup and full backup are separate projects and I believe it's completely reasonable for whoever is doing the work to decide on the order in which they would like to do the work. Having said that, I'm curious what people other than Stephen (and other pgbackrest hackers) think about the relative value of parallel backup vs. incremental backup. Stephen appears quite convinced that parallel backup is full of win and incremental backup is a bit of a yawn by comparison, and while I certainly would not want to discount the value of his experience in this area, it sometimes happens on this mailing list that [ drum roll please ] not everybody agrees about everything. So, what do other people think? Based on the experience of pg_probackup users I can say that there is no 100% winer and depending on use case either parallel either incremental backups are preferable. - If size of database is not so larger and intensity of updates is high enough, then parallel backup within one data center is definitely more efficient solution. - If size of database is very large and data is rarely updated or database is mostly append-only, then incremental backup is preferable. - Some customers need to collect at central server backups of databases installed at many nodes with slow and unreliable connection (assume DBMS installed at locomotives). Definitely parallelism can not help here, unlike support of incremental backup. - Parallel backup more aggressively consumes resources of the system, interfering with normal work of application. So performing parallel backup may cause significant degradation of application speed. pg_probackup supports both features: parallel and incremental backups and it is up to user how to use it in more efficient way for particular configuration. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: block-level incremental backup
On Sat, Apr 20, 2019 at 4:32 PM Stephen Frost wrote: > Having been around for a while working on backup-related things, if I > was to implement the protocol for pg_basebackup today, I'd definitely > implement "give me a list" and "give me this file" rather than the > tar-based approach, because I've learned that people want to be > able to do parallel backups and that's a decent way to do that. I > wouldn't set out and implement something new that's there's just no hope > of making parallel. Maybe the first write of pg_basebackup would still > be simple and serial since it's certainly more work to make a frontend > tool like that work in parallel, but at least the protocol would be > ready to support a parallel option being added alter without being > rewritten. > > And that's really what I was trying to get at here- if we've got the > choice now to decide what this is going to look like from a protocol > level, it'd be great if we could make it able to support being used in a > parallel fashion, even if pg_basebackup is still single-threaded. I think we're getting closer to a meeting of the minds here, but I don't think it's intrinsically necessary to rewrite the whole method of operation of pg_basebackup to implement incremental backup in a sensible way. One could instead just do a straightforward extension to the existing BASE_BACKUP command to enable incremental backup. Then, to enable parallel full backup and all sorts of out-of-core hacking, one could expand the command language to allow tools to access individual steps: START_BACKUP, SEND_FILE_LIST, SEND_FILE_CONTENTS, STOP_BACKUP, or whatever. The second thing makes for an appealing project, but I do not think there is a technical reason why it has to be done first. Or for that matter why it has to be done second. As I keep saying, incremental backup and full backup are separate projects and I believe it's completely reasonable for whoever is doing the work to decide on the order in which they would like to do the work. Having said that, I'm curious what people other than Stephen (and other pgbackrest hackers) think about the relative value of parallel backup vs. incremental backup. Stephen appears quite convinced that parallel backup is full of win and incremental backup is a bit of a yawn by comparison, and while I certainly would not want to discount the value of his experience in this area, it sometimes happens on this mailing list that [ drum roll please ] not everybody agrees about everything. So, what do other people think? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: block-level incremental backup
> 21 апр. 2019 г., в 1:13, Robert Haas написал(а): > > On Sat, Apr 20, 2019 at 12:44 PM Andrey Borodin wrote: >> Incremental backup of 1Tb DB made with distance of few minutes (small change >> set) is few Gbs. All of this size is made of FSM (no LSN) and VM (hard to >> use LSN). >> Sure, this overhead size is fine if we make daily backup. But at some >> frequency of backups it will be too much. > > It seems like if the backups are only a few minutes apart, PITR might > be a better choice than super-frequent incremental backups. What do > you think about that? PITR is painfully slow on heavily loaded clusters. I observed restorations when 5 seconds of WAL were restored in 4 seconds. Backup was only few hours past primary node, but could catch up only at night. And during this process only one of 56 cpu cores was used. And SSD RAID throughput was not 100% utilized. Block level delta backups can be restored very efficiently: if we restore from newest to past steps, we write no more than cluster size at last backup. >> I think that problem of incrementing FSM and VM is too distant now. >> But if I had to implement it right now I'd choose following way: do not >> backup FSM and VM, recreate it during restore. Looks like it is possible, >> but too much AM-specific. > > Interesting idea - that's worth some more thought. Core routines to recreate VM and FSM would be cool :) But this need to be done without extra IO, not an easy trick. >> Here's 53 mentions of "parallel backup". I want to note that there may be >> parallel read from disk and parallel network transmission. Things between >> these two are neglectable and can be single-threaded. From my POV, it's not >> about threads, it's about saturated IO controllers. >> Also I think parallel restore matters more than parallel backup. Backups >> themself can be slow, on many clusters we even throttle disk IO. But users >> may want parallel backup to catch-up standby. > > I'm not sure I entirely understand your point here -- are you saying > that parallel backup is important, or that it's not important, or > something in between? Do you think it's more or less important than > incremental backup? I think that there is no such thing as parallel backup. Backup creation is composite process of many subprocesses. In my experience, parallel network transmission is cool and very important, it makes upload 3 times faster. But my experience is limited to cloud storages. Would this hold if storage backend is local FS? I have no idea. Parallel reading from disk has the same effect. Compression and encryption can be single threaded, I think it will not be bottleneck (unless one uses lzma's neighborhood on Pareto frontier). For me, I think the most important thing is incremental backups (with parallel steps merge) and then parallel backup. But there is huge fraction of users, who can benefit from parallel backup and do not need incremental backup at all. Best regards, Andrey Borodin.