subject:"block\-level incremental backup"

Re: block-level incremental backup

2019-09-17 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Sep 17, 2019 at 12:09 PM Stephen Frost  wrote:
> > We need to be sure that we can detect if the WAL level has ever been set
> > to minimal between a full and an incremental and, if so, either refuse
> > to run the incremental, or promote it to a full, or make it a
> > checksum-based incremental instead of trusting the WAL stream.
> 
> Sure. What about checksum collisions?

Certainly possible, of course, but a sha256 of each file is at least
somewhat better than, say, our page-level checksums.  I do agree that
having the option to just say "promote it to a full", or "do a
byte-by-byte comparison against the prior backed up file" would be
useful for those who are concerned about sha256 collision probabilities.

Having a cross-check of "does this X% of files that we decided not to
back up due to whatever really still match what we think is in the
backup?" is definitely a valuable feature and one which I'd hope we get
to at some point.

> > Ok..  I can understand that but I don't get how these changes to
> > pg_basebackup will help facilitate that.  If they don't and what you're
> > talking about here is independent, then great, that clarifies things,
> > but if you're saying that these changes to pg_basebackup are to help
> > with backing up directly into those Enterprise systems then I'm just
> > asking for some help in understanding how- what's the use-case here that
> > we're adding to pg_basebackup that makes it work with these Enterprise
> > systems?
> >
> > I'm not trying to be difficult here, I'm just trying to understand.
> 
> Man, I feel like we're totally drifting off into the weeds here.  I'm
> not arguing that these changes to pg_basebackup will help enterprise
> users except insofar as those users want incremental backup.  All of
> this discussion started with this comment from you:
> 
> "Having a system of keeping track of which backups are full and which
> are differential in an overall system also gives you the ability to do
> things like expiration in a sensible way, including handling WAL
> expiration."
> 
> All I was doing was saying that for an enterprise user, the overall
> system might be something entirely outside of our control, like
> NetBackup or Tivoli. Therefore, whatever functionality we provide to
> do that kind of thing should be able to be used in such contexts. That
> hardly seems like a controversial proposition.

And all I was trying to understand was how what pg_basebackup does in
this context is really different from what can be done with pgbackrest,
since you brought it up:

"True, but I'm not sure that functionality belongs in core. It
certainly needs to be possible for out-of-core code to do this part of
the work if desired, because people want to integrate with enterprise
backup systems, and we can't come in and say, well, you back up
everything else using Netbackup or Tivoli, but for PostgreSQL you have
to use pg_backrest. I mean, maybe you can win that argument, but I
know I can't."

What it sounds like you're argueing here is that what pg_basebackup
"has" in it is that it specifically doesn't include any kind of
expiration management of any kind, and that's somehow helpful to people
who want to use Enterprise backup solutions.  Maybe that's what you were
getting at, in which case, I'm sorry for misunderstanding and dragging
it out, and thanks for helping me understand.

> > How would that tool work, if it's to be able to work regardless of where
> > the WAL is actually stored..?  Today, pg_archivecleanup just works
> > against a POSIX filesystem- are you thinking that the tool would have a
> > pluggable storage system, so that it could work with, say, a POSIX
> > filesystem, or a CIFS mount, or a s3-like system?
> 
> Again, I was making a general statement about design goals -- "we
> should try to work nicely with enterprise backup products" -- not
> proposing a specific design for a specific thing. I don't think the
> idea of some pluggability in that area is a bad one, but it's not even
> slightly what this thread is about.

Well, I agree with you, as I said up-thread, that this seemed to be
going in a different and perhaps not entirely relevant direction.

> > Provided the WAL level is at the level that you need it to be that will
> > be true for things which are actually supported with PITR, replication
> > to standby servers, et al.  I can see how it might come across as an
> > overreaction but this strikes me as a pretty glaring issue and I worry
> > that if it was overlooked until now that there'll be other more subtle
> > issues, and backups are just plain complicated to get right, just to
> > begin with already, something that I don't think people appreciate until
> > they've been dealing with them for quite a while.
> 
> Permit me to be unpersuaded. If it was such a glaring issue, and if
> experience is the key to spotting such issues, then why didn't YOU
> spot it?

I'm not designing the f

Re: block-level incremental backup

2019-09-17 Thread Robert Haas

On Tue, Sep 17, 2019 at 12:09 PM Stephen Frost  wrote:
> We need to be sure that we can detect if the WAL level has ever been set
> to minimal between a full and an incremental and, if so, either refuse
> to run the incremental, or promote it to a full, or make it a
> checksum-based incremental instead of trusting the WAL stream.

Sure. What about checksum collisions?

> Just to be clear, I see your points and I like the general idea of
> finding solutions, but it seems like the issues are likely to be pretty
> complex and I'm not sure that's being appreciated very well.

Definitely possible, but it's more helpful if you can point out the
actual issues.

> Ok..  I can understand that but I don't get how these changes to
> pg_basebackup will help facilitate that.  If they don't and what you're
> talking about here is independent, then great, that clarifies things,
> but if you're saying that these changes to pg_basebackup are to help
> with backing up directly into those Enterprise systems then I'm just
> asking for some help in understanding how- what's the use-case here that
> we're adding to pg_basebackup that makes it work with these Enterprise
> systems?
>
> I'm not trying to be difficult here, I'm just trying to understand.

Man, I feel like we're totally drifting off into the weeds here.  I'm
not arguing that these changes to pg_basebackup will help enterprise
users except insofar as those users want incremental backup.  All of
this discussion started with this comment from you:

"Having a system of keeping track of which backups are full and which
are differential in an overall system also gives you the ability to do
things like expiration in a sensible way, including handling WAL
expiration."

All I was doing was saying that for an enterprise user, the overall
system might be something entirely outside of our control, like
NetBackup or Tivoli. Therefore, whatever functionality we provide to
do that kind of thing should be able to be used in such contexts. That
hardly seems like a controversial proposition.

> How would that tool work, if it's to be able to work regardless of where
> the WAL is actually stored..?  Today, pg_archivecleanup just works
> against a POSIX filesystem- are you thinking that the tool would have a
> pluggable storage system, so that it could work with, say, a POSIX
> filesystem, or a CIFS mount, or a s3-like system?

Again, I was making a general statement about design goals -- "we
should try to work nicely with enterprise backup products" -- not
proposing a specific design for a specific thing. I don't think the
idea of some pluggability in that area is a bad one, but it's not even
slightly what this thread is about.

> Provided the WAL level is at the level that you need it to be that will
> be true for things which are actually supported with PITR, replication
> to standby servers, et al.  I can see how it might come across as an
> overreaction but this strikes me as a pretty glaring issue and I worry
> that if it was overlooked until now that there'll be other more subtle
> issues, and backups are just plain complicated to get right, just to
> begin with already, something that I don't think people appreciate until
> they've been dealing with them for quite a while.

Permit me to be unpersuaded. If it was such a glaring issue, and if
experience is the key to spotting such issues, then why didn't YOU
spot it?

I'm not arguing that this stuff isn't hard. It is. Nor am I arguing
that I didn't screw up. I did. But designs need to be accepted or
rejected based on facts, not FUD. You've raised some good technical
points and if you've got more concerns, I'd like to hear them, but I
don't think arguing vaguely that a certain approach will probably run
into trouble gets us anywhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-17 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Sep 16, 2019 at 3:38 PM Stephen Frost  wrote:
> > As discussed nearby, not everything that needs to be included in the
> > backup is actually going to be in the WAL though, right?  How would that
> > ever be able to handle the case where someone starts the server under
> > wal_level = logical, takes a full backup, then restarts with wal_level =
> > minimal, writes out a bunch of new data, and then restarts back to
> > wal_level = logical and takes an incremental?
> 
> Fair point. I think the WAL-scanning approach can only work if
> wal_level > minimal. But, I also think that few people run with
> wal_level = minimal in this era where the default has been changed to
> replica; and I think we can detect the WAL level in use while scanning
> WAL. It can only change at a checkpoint.

We need to be sure that we can detect if the WAL level has ever been set
to minimal between a full and an incremental and, if so, either refuse
to run the incremental, or promote it to a full, or make it a
checksum-based incremental instead of trusting the WAL stream.

I'm also glad that we ended up changing the default though and I do hope
that there's relatively few people running with minimal and that there's
even fewer who play around with flipping it back and forth.

> > On larger systems, so many of the files are 1GB in size that checking
> > the file size is quite close to meaningless.  Yes, having to checksum
> > all of the files definitely adds to the cost of taking the backup, but
> > to avoid it we need strong assurances that a given file hasn't been
> > changed since our last full backup.  WAL, today at least, isn't quite
> > that, and timestamps can possibly be fooled with, so if you'd like to be
> > particularly careful, there doesn't seem to be a lot of alternatives.
> 
> I see your points, but it feels like you're trying to talk down the
> WAL-based approach over what seem to me to be fairly manageable corner
> cases.

Just to be clear, I see your points and I like the general idea of
finding solutions, but it seems like the issues are likely to be pretty
complex and I'm not sure that's being appreciated very well.

> > I'm not asking you to be an expert on those systems, just to help me
> > understand the statements you're making.  How is backing up to a
> > pgbackrest repo different than running a pg_basebackup in the context of
> > using some other Enterprise backup system?  In both cases, you'll have a
> > full copy of the backup (presumably compressed) somewhere out on a disk
> > or filesystem which is then backed up by the Enterprise tool.
> 
> Well, I think that what people really want is to be able to backup
> straight into the enterprise tool, without an intermediate step.

Ok..  I can understand that but I don't get how these changes to
pg_basebackup will help facilitate that.  If they don't and what you're
talking about here is independent, then great, that clarifies things,
but if you're saying that these changes to pg_basebackup are to help
with backing up directly into those Enterprise systems then I'm just
asking for some help in understanding how- what's the use-case here that
we're adding to pg_basebackup that makes it work with these Enterprise
systems?

I'm not trying to be difficult here, I'm just trying to understand.

> My basic point here is: As with practically all PostgreSQL
> development, I think we should try to expose capabilities and avoid
> making policy on behalf of users.
> 
> I'm not objecting to the idea of having tools that can help users
> figure out how much WAL they need to retain -- but insofar as we can
> do it, such tools should work regardless of where that WAL is actually
> stored. 

How would that tool work, if it's to be able to work regardless of where
the WAL is actually stored..?  Today, pg_archivecleanup just works
against a POSIX filesystem- are you thinking that the tool would have a
pluggable storage system, so that it could work with, say, a POSIX
filesystem, or a CIFS mount, or a s3-like system?

> I dislike the idea that PostgreSQL would provide something
> akin to a "pgbackrest repository" in core, or I at least I think it
> would be important that we're careful about how much functionality
> gets tied to the presence and use of such a thing, because, at least
> based on my experience working at EnterpriseDB, larger customers often
> don't want to do it that way.

This seems largely independent of the above discussion, but since we're
discussing it, I've certainly had various experiences in this area too-
some larger customers would like to use an s3-like store (which
pgbackrest already supports and will be supporting others going forward
as it has a pluggable storage mechanism for the repo...), and then
there's customers who would like to point their Enterprise backup
solution at a directory on disk to back it up (which pgbackrest also
supports, as mentioned previously), and lastly there's cus

Re: block-level incremental backup

2019-09-17 Thread Robert Haas

On Mon, Sep 16, 2019 at 3:38 PM Stephen Frost  wrote:
> As discussed nearby, not everything that needs to be included in the
> backup is actually going to be in the WAL though, right?  How would that
> ever be able to handle the case where someone starts the server under
> wal_level = logical, takes a full backup, then restarts with wal_level =
> minimal, writes out a bunch of new data, and then restarts back to
> wal_level = logical and takes an incremental?

Fair point. I think the WAL-scanning approach can only work if
wal_level > minimal. But, I also think that few people run with
wal_level = minimal in this era where the default has been changed to
replica; and I think we can detect the WAL level in use while scanning
WAL. It can only change at a checkpoint.

> On larger systems, so many of the files are 1GB in size that checking
> the file size is quite close to meaningless.  Yes, having to checksum
> all of the files definitely adds to the cost of taking the backup, but
> to avoid it we need strong assurances that a given file hasn't been
> changed since our last full backup.  WAL, today at least, isn't quite
> that, and timestamps can possibly be fooled with, so if you'd like to be
> particularly careful, there doesn't seem to be a lot of alternatives.

I see your points, but it feels like you're trying to talk down the
WAL-based approach over what seem to me to be fairly manageable corner
cases.

> I'm not asking you to be an expert on those systems, just to help me
> understand the statements you're making.  How is backing up to a
> pgbackrest repo different than running a pg_basebackup in the context of
> using some other Enterprise backup system?  In both cases, you'll have a
> full copy of the backup (presumably compressed) somewhere out on a disk
> or filesystem which is then backed up by the Enterprise tool.

Well, I think that what people really want is to be able to backup
straight into the enterprise tool, without an intermediate step.

My basic point here is: As with practically all PostgreSQL
development, I think we should try to expose capabilities and avoid
making policy on behalf of users.

I'm not objecting to the idea of having tools that can help users
figure out how much WAL they need to retain -- but insofar as we can
do it, such tools should work regardless of where that WAL is actually
stored. I dislike the idea that PostgreSQL would provide something
akin to a "pgbackrest repository" in core, or I at least I think it
would be important that we're careful about how much functionality
gets tied to the presence and use of such a thing, because, at least
based on my experience working at EnterpriseDB, larger customers often
don't want to do it that way.

> That's not great, of course, which is why there are trade-offs to be
> made, one of which typically involves using timestamps, but doing so
> quite carefully, to perform the file exclusion.  Other ideas are great
> but it seems like WAL isn't really a great idea unless we make some
> changes there and we, as in PG, haven't got a robust "we know this file
> changed as of this point" to work from.  I worry that we're putting too
> much faith into a system to do something independent of what it was
> actually built and designed to do, and thinking that because we could
> trust it for X, we can trust it for Y.

That seems like a considerable overreaction to me based on the
problems reported thus far. The fact is, WAL was originally intended
for crash recovery and has subsequently been generalized to be usable
for point-in-time recovery, standby servers, and logical decoding.
It's clearly established at this point as the canonical way that you
know what in the database has changed, which is the same need that we
have for incremental backup.

At any rate, the same criticism can be leveled - IMHO with a lot more
validity - at timestamps. Last-modification timestamps are completely
outside of our control; they are owned by the OS and various operating
systems can and do have varying behavior. They can go backwards when
things have changed; they can go forwards when things have not
changed. They were clearly not intended to meet this kind of
requirement. Even, they were intended for that purpose much less so
than WAL, which was actually designed for a requirement in this
general ballpark, if not this thing precisely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-17 Thread Amit Kapila

On Mon, Sep 16, 2019 at 7:00 PM Robert Haas  wrote:
>
> On Mon, Sep 16, 2019 at 4:31 AM Amit Kapila  wrote:
> > This seems to be a blocking problem for the LSN based design.
>
> Well, only the simplest version of it, I think.
>
> > Can we think of using creation time for file?  Basically, if the file
> > creation time is later than backup-labels "START TIME:", then include
> > that file entirely.  I think one big point against this is clock skew
> > like what if somebody tinkers with the clock.  And also, this can
> > cover cases like
> > what Jeevan has pointed but might not cover other cases which we found
> > problematic.
>
> Well that would mean, for example, that if you copied the data
> directory from one machine to another, the next "incremental" backup
> would turn into a full backup. That sucks. And in other situations,
> like resetting the clock, it could mean that you end up with a corrupt
> backup without any real ability for PostgreSQL to detect it. I'm not
> saying that it is impossible to create a practically useful system
> based on file time stamps, but I really don't like it.
>
> > I think the operations covered by WAL flag XLR_SPECIAL_REL_UPDATE will
> > have similar problems.
>
> I'm not sure quite what you mean by that.  Can you elaborate? It
> appears to me that the XLR_SPECIAL_REL_UPDATE operations are all
> things that create files, remove files, or truncate files, and the
> sketch in my previous email would handle the first two of those cases
> correctly.  See below for the third.
>
> > One related point is how do incremental backups handle the case where
> > vacuum truncates the relation partially?  Basically, with current
> > patch/design, it doesn't appear that such information can be passed
> > via incremental backup.  I am not sure if this is a problem, but it
> > would be good if we can somehow handle this.
>
> As to this, if you're taking a full backup of a particular file,
> there's no problem.  If you're taking a partial backup of a particular
> file, you need to include the current length of the file and the
> identity and contents of each modified block.  Then you're fine.
>

Right, this should address that point.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: block-level incremental backup

2019-09-17 Thread Amit Kapila

On Mon, Sep 16, 2019 at 11:09 PM Stephen Frost  wrote:
>
> Greetings,
>
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Mon, Sep 16, 2019 at 9:30 AM Robert Haas  wrote:
> > > > Isn't some operations where at the end we directly call heap_sync
> > > > without writing WAL will have a similar problem as well?
> > >
> > > Maybe.  Can you give an example?
> >
> > Looking through the code, I found two cases where we do this.  One is
> > a bulk insert operation with wal_level = minimal, and the other is
> > CLUSTER or VACUUM FULL with wal_level = minimal. In both of these
> > cases we are generating new blocks whose LSNs will be 0/0. So, I think
> > we need a rule that if the server is asked to back up all blocks in a
> > file with LSNs > some threshold LSN, it must also include any blocks
> > whose LSN is 0/0. Those blocks are either uninitialized or are
> > populated without WAL logging, so they always need to be copied.
>
> I'm not sure I see a way around it but this seems pretty unfortunate-
> every single incremental backup will have all of those included even
> though the full backup likely also does
>

Yeah, this is quite unfortunate.  One more thing to note is that the
same is true for other operation like 'create index' (ex. nbtree
bypasses buffer manager while creating the index, doesn't write wal
for wal_level=minimal and then syncs at the end).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: block-level incremental backup

2019-09-16 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Sep 16, 2019 at 1:10 PM Stephen Frost  wrote:
> > I disagree with this on a couple of levels.  The first is pretty simple-
> > we don't have all of the information.  The user may have some reason to
> > believe that timestamp-based is a bad idea, for example, and therefore
> > having an option to perform a checksum-based backup makes sense.  rsync
> > is a pretty good tool in my view and it has a very similar option-
> > because there are trade-offs to be made.  LSN is great, if you don't
> > mind reading every file of your database start-to-finish every time, but
> > in a running system which hasn't suffered from clock skew or other odd
> > issues (some of which we can also detect), it's pretty painful to scan
> > absolutely everything like that for an incremental.
> 
> There's a separate thread on using WAL-scanning to avoid having to
> scan all the data every time. I pointed it out to you early in this
> thread, too.

As discussed nearby, not everything that needs to be included in the
backup is actually going to be in the WAL though, right?  How would that
ever be able to handle the case where someone starts the server under
wal_level = logical, takes a full backup, then restarts with wal_level =
minimal, writes out a bunch of new data, and then restarts back to
wal_level = logical and takes an incremental?

How would we even detect that such a thing happened?

> > If you track the checksum of the file in the manifest then it's a pretty
> > strong validation that the backup repo hasn't been corrupted between the
> > backup and the restore.  Of course, the database could have been
> > corrupted at the source, and perhaps that's what you were getting at
> > with your 'limited extent' but that isn't what I was referring to.
> 
> Yeah, that all seems fair. Without the checksum, you can only validate
> that you have the right files and that they are the right sizes, which
> is not bad, but the checksums certainly make it stronger. But,
> wouldn't having to checksum all of the files add significantly to the
> cost of taking the backup? If so, I can imagine that some people might
> want to pay that cost but others might not. If it's basically free to
> checksum the data while we have it in memory anyway, then I guess
> there's little to be lost.

On larger systems, so many of the files are 1GB in size that checking
the file size is quite close to meaningless.  Yes, having to checksum
all of the files definitely adds to the cost of taking the backup, but
to avoid it we need strong assurances that a given file hasn't been
changed since our last full backup.  WAL, today at least, isn't quite
that, and timestamps can possibly be fooled with, so if you'd like to be
particularly careful, there doesn't seem to be a lot of alternatives.

> > I'm pretty baffled by this argument, particularly in this context.  We
> > already have tooling around trying to manage WAL archives in core- see
> > pg_archivecleanup.  Further, we're talking about pg_basebackup here, not
> > about Netbackup or Tivoli, and the results of a pg_basebackup (that is,
> > a set of tar files, or a data directory) could happily be backed up
> > using whatever Enterprise tool folks want to use- in much the same way
> > that a pgbackrest repo is also able to be backed up using whatever
> > Enterprise tools someone wishes to use.  We designed it quite carefully
> > to work with exactly that use-case, so the distinction here is quite
> > lost on me.  Perhaps you could clarify what use-case these changes to
> > pg_basebackup solve, when working with a Netbackup or Tivoli system,
> > that pgbackrest doesn't, since you bring it up here?
> 
> I'm not an expert on any of those systems, but I doubt that
> everybody's OK with backing everything up to a pgbackrest repository
> and then separately backing up that repository to some other system.
> That sounds like a pretty large storage cost.

I'm not asking you to be an expert on those systems, just to help me
understand the statements you're making.  How is backing up to a
pgbackrest repo different than running a pg_basebackup in the context of
using some other Enterprise backup system?  In both cases, you'll have a
full copy of the backup (presumably compressed) somewhere out on a disk
or filesystem which is then backed up by the Enterprise tool.

> > As for if we should be sending more to the server, or asking the server
> > to send more to us, I don't really have a good feel for what's "best".
> > At least one implementation I'm familiar with builds a manifest on the
> > PG server side and then compares the results of that to the manifest
> > stored with the backup (where that comparison is actually done is on
> > whatever system the "backup" was started from, typically a backup
> > server).  Perhaps there's an argument for sending the manifest from the
> > backup repository to PostgreSQL for it to then compare against the data
> > directory bu

Re: block-level incremental backup

2019-09-16 Thread Robert Haas

On Mon, Sep 16, 2019 at 1:10 PM Stephen Frost  wrote:
> I disagree with this on a couple of levels.  The first is pretty simple-
> we don't have all of the information.  The user may have some reason to
> believe that timestamp-based is a bad idea, for example, and therefore
> having an option to perform a checksum-based backup makes sense.  rsync
> is a pretty good tool in my view and it has a very similar option-
> because there are trade-offs to be made.  LSN is great, if you don't
> mind reading every file of your database start-to-finish every time, but
> in a running system which hasn't suffered from clock skew or other odd
> issues (some of which we can also detect), it's pretty painful to scan
> absolutely everything like that for an incremental.

There's a separate thread on using WAL-scanning to avoid having to
scan all the data every time. I pointed it out to you early in this
thread, too.

> If you track the checksum of the file in the manifest then it's a pretty
> strong validation that the backup repo hasn't been corrupted between the
> backup and the restore.  Of course, the database could have been
> corrupted at the source, and perhaps that's what you were getting at
> with your 'limited extent' but that isn't what I was referring to.

Yeah, that all seems fair. Without the checksum, you can only validate
that you have the right files and that they are the right sizes, which
is not bad, but the checksums certainly make it stronger. But,
wouldn't having to checksum all of the files add significantly to the
cost of taking the backup? If so, I can imagine that some people might
want to pay that cost but others might not. If it's basically free to
checksum the data while we have it in memory anyway, then I guess
there's little to be lost.

> I'm pretty baffled by this argument, particularly in this context.  We
> already have tooling around trying to manage WAL archives in core- see
> pg_archivecleanup.  Further, we're talking about pg_basebackup here, not
> about Netbackup or Tivoli, and the results of a pg_basebackup (that is,
> a set of tar files, or a data directory) could happily be backed up
> using whatever Enterprise tool folks want to use- in much the same way
> that a pgbackrest repo is also able to be backed up using whatever
> Enterprise tools someone wishes to use.  We designed it quite carefully
> to work with exactly that use-case, so the distinction here is quite
> lost on me.  Perhaps you could clarify what use-case these changes to
> pg_basebackup solve, when working with a Netbackup or Tivoli system,
> that pgbackrest doesn't, since you bring it up here?

I'm not an expert on any of those systems, but I doubt that
everybody's OK with backing everything up to a pgbackrest repository
and then separately backing up that repository to some other system.
That sounds like a pretty large storage cost.

> As for if we should be sending more to the server, or asking the server
> to send more to us, I don't really have a good feel for what's "best".
> At least one implementation I'm familiar with builds a manifest on the
> PG server side and then compares the results of that to the manifest
> stored with the backup (where that comparison is actually done is on
> whatever system the "backup" was started from, typically a backup
> server).  Perhaps there's an argument for sending the manifest from the
> backup repository to PostgreSQL for it to then compare against the data
> directory but I'm not really sure how it could possibly do that more
> efficiently and that's moving work to the PG server that it doesn't
> really need to do.

I agree with all that, but... if the server builds a manifest on the
PG server that is to be compared with the backup's manifest, the one
the PG server builds can't really include checksums, I think. To get
the checksums, it would have to read the entire cluster while building
the manifest, which sounds insane. Presumably it would have to build a
checksum-free version of the manifest, and then the client could
checksum the files as they're streamed down and write out a revised
manifest that adds the checksums.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-16 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Sep 16, 2019 at 9:30 AM Robert Haas  wrote:
> > > Isn't some operations where at the end we directly call heap_sync
> > > without writing WAL will have a similar problem as well?
> >
> > Maybe.  Can you give an example?
> 
> Looking through the code, I found two cases where we do this.  One is
> a bulk insert operation with wal_level = minimal, and the other is
> CLUSTER or VACUUM FULL with wal_level = minimal. In both of these
> cases we are generating new blocks whose LSNs will be 0/0. So, I think
> we need a rule that if the server is asked to back up all blocks in a
> file with LSNs > some threshold LSN, it must also include any blocks
> whose LSN is 0/0. Those blocks are either uninitialized or are
> populated without WAL logging, so they always need to be copied.

I'm not sure I see a way around it but this seems pretty unfortunate-
every single incremental backup will have all of those included even
though the full backup likely also does (I say likely since someone
could do a full backup, set the WAL to minimal, load a bunch of data,
and then restart back to a WAL level where we can do a new backup, and
then do an incremental, so we don't *know* that the full includes those
blocks unless we also track a block-level checksum or similar).  Then
again, doing these kinds of server bounces to change the WAL level
around is, hopefully, relatively rare..

> Outside of unlogged and temporary tables, I don't know of any case
> where make a critical modification to an already-existing block
> without bumping the LSN. I hope there is no such case.

I believe we all do. :)

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: block-level incremental backup

2019-09-16 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Sep 16, 2019 at 10:38 AM Stephen Frost  wrote:
> > In a number of cases, trying to make sure that on a failover or copy of
> > the backup the next 'incremental' is really an 'incremental' is
> > dangerous.  A better strategy to address this, and the other issues
> > realized on this thread recently, is to:
> >
> > - Have a manifest of every file in each backup
> > - Always back up new files that weren't in the prior backup
> > - Keep a checksum of each file
> > - Track the timestamp of each file as of when it was backed up
> > - Track the file size of each file
> > - Track the starting timestamp of each backup
> > - Always include files with a modification time after the starting
> >   timestamp of the prior backup, or if the file size has changed
> > - In the event of any anomolies (which includes things like a timeline
> >   switch), use checksum matching (aka 'delta checksum backup') to
> >   perform the backup instead of using timestamps (or just always do that
> >   if you want to be particularly careful- having an option for it is
> >   great)
> > - Probably other things I'm not thinking of off-hand, but this is at
> >   least a good start.  Make sure to checksum this information too.
> 
> I agree with some of these ideas but not all of them.  I think having
> a backup manifest is a good idea; that would allow taking a new
> incremental backup to work from the manifest rather than the data
> directory, which could be extremely useful, because it might be a lot
> faster and the manifest could also be copied to a machine other than
> the one where the entire backup is stored. If the backup itself has
> been pushed off to S3 or whatever, you can't access it quickly, but
> you could keep the manifest around.

Yes, those are also good reasons for having a manifest.

> I also agree that backing up all files that weren't in the previous
> backup is a good strategy.  I proposed that fairly explicitly a few
> emails back; but also, the contrary is obviously nonsense. And I also
> agree with, and proposed, that we record the size along with the file.

Sure, I didn't mean to imply that there was something wrong with that.
Including the checksum and other metadata is also valuable, both for
helping to identify corruption in the backup archive and for forensics,
if not for other reasons.

> I don't really agree with your comments about checksums and
> timestamps.  I think that, if possible, there should be ONE method of
> determining whether a block has changed in some important way, and I
> think if we can make LSN work, that would be for the best. If you use
> multiple methods of detecting changes without any clearly-defined
> reason for so doing, maybe what you're saying is that you don't really
> believe that any of the methods are reliable but if we throw the
> kitchen sink at the problem it should come out OK. Any bugs in one
> mechanism are likely to be masked by one of the others, but that's not
> as as good as one method that is known to be altogether reliable.

I disagree with this on a couple of levels.  The first is pretty simple-
we don't have all of the information.  The user may have some reason to
believe that timestamp-based is a bad idea, for example, and therefore
having an option to perform a checksum-based backup makes sense.  rsync
is a pretty good tool in my view and it has a very similar option-
because there are trade-offs to be made.  LSN is great, if you don't
mind reading every file of your database start-to-finish every time, but
in a running system which hasn't suffered from clock skew or other odd
issues (some of which we can also detect), it's pretty painful to scan
absolutely everything like that for an incremental.

Perhaps the discussion has already moved on to having some way of our
own to track if a given file has changed without having to scan all of
it- if so, that's a discussion I'd be interested in.  I'm not against
other approaches here besides timestamps if there's a solid reason why
they're better and they're also able to avoid scanning the entire
database.

> > By having a manifest for each backed up file for each backup, you also
> > gain the ability to validate that a backup in the repository hasn't been
> > corrupted post-backup, a feature that at least some other database
> > backup and restore systems have (referring specifically to the big O in
> > this particular case, but I bet others do too).
> 
> Agreed. The manifest only lets you validate to a limited extent, but
> that's still useful.

If you track the checksum of the file in the manifest then it's a pretty
strong validation that the backup repo hasn't been corrupted between the
backup and the restore.  Of course, the database could have been
corrupted at the source, and perhaps that's what you were getting at
with your 'limited extent' but that isn't what I was referring to.

Claiming that the backup has been 'validated' by only looking at

Re: block-level incremental backup

2019-09-16 Thread Robert Haas

On Mon, Sep 16, 2019 at 10:38 AM Stephen Frost  wrote:
> In a number of cases, trying to make sure that on a failover or copy of
> the backup the next 'incremental' is really an 'incremental' is
> dangerous.  A better strategy to address this, and the other issues
> realized on this thread recently, is to:
>
> - Have a manifest of every file in each backup
> - Always back up new files that weren't in the prior backup
> - Keep a checksum of each file
> - Track the timestamp of each file as of when it was backed up
> - Track the file size of each file
> - Track the starting timestamp of each backup
> - Always include files with a modification time after the starting
>   timestamp of the prior backup, or if the file size has changed
> - In the event of any anomolies (which includes things like a timeline
>   switch), use checksum matching (aka 'delta checksum backup') to
>   perform the backup instead of using timestamps (or just always do that
>   if you want to be particularly careful- having an option for it is
>   great)
> - Probably other things I'm not thinking of off-hand, but this is at
>   least a good start.  Make sure to checksum this information too.

I agree with some of these ideas but not all of them.  I think having
a backup manifest is a good idea; that would allow taking a new
incremental backup to work from the manifest rather than the data
directory, which could be extremely useful, because it might be a lot
faster and the manifest could also be copied to a machine other than
the one where the entire backup is stored. If the backup itself has
been pushed off to S3 or whatever, you can't access it quickly, but
you could keep the manifest around.

I also agree that backing up all files that weren't in the previous
backup is a good strategy.  I proposed that fairly explicitly a few
emails back; but also, the contrary is obviously nonsense. And I also
agree with, and proposed, that we record the size along with the file.

I don't really agree with your comments about checksums and
timestamps.  I think that, if possible, there should be ONE method of
determining whether a block has changed in some important way, and I
think if we can make LSN work, that would be for the best. If you use
multiple methods of detecting changes without any clearly-defined
reason for so doing, maybe what you're saying is that you don't really
believe that any of the methods are reliable but if we throw the
kitchen sink at the problem it should come out OK. Any bugs in one
mechanism are likely to be masked by one of the others, but that's not
as as good as one method that is known to be altogether reliable.

> By having a manifest for each backed up file for each backup, you also
> gain the ability to validate that a backup in the repository hasn't been
> corrupted post-backup, a feature that at least some other database
> backup and restore systems have (referring specifically to the big O in
> this particular case, but I bet others do too).

Agreed. The manifest only lets you validate to a limited extent, but
that's still useful.

> Having a system of keeping track of which backups are full and which are
> differential in an overall system also gives you the ability to do
> things like expiration in a sensible way, including handling WAL
> expiration.

True, but I'm not sure that functionality belongs in core. It
certainly needs to be possible for out-of-core code to do this part of
the work if desired, because people want to integrate with enterprise
backup systems, and we can't come in and say, well, you back up
everything else using Netbackup or Tivoli, but for PostgreSQL you have
to use pg_backrest. I mean, maybe you can win that argument, but I
know I can't.

> I'd like to clarify that while I would like to have an easier way to
> parallelize backups, that's a relatively minor complaint- the much
> bigger issue that I have with this feature is that trying to address
> everything correctly while having only the amount of information that
> could be passed on the command-line about the prior full/incremental is
> going to be extremely difficult, complicated, and likely to lead to
> subtle bugs in the actual code, and probably less than subtle bugs in
> how users end up using it, since they'll have to implement the
> expiration and tracking of information between backups themselves
> (unless something's changed in that part during this discussion- I admit
> that I've not read every email in this thread).

Well, the evidence seems to show that you are right, at least to some
extent. I consider it a positive good if the client needs to give the
server only a limited amount of information. After all, you could
always take an incremental backup by shipping every byte of the
previous backup to the server, having it compare everything to the
current contents, and having it then send you back the stuff that is
new or different. But that would be dumb, because most of the point of
an incremental backup is to save on send

Re: block-level incremental backup

2019-09-16 Thread Robert Haas

On Mon, Sep 16, 2019 at 9:30 AM Robert Haas  wrote:
> > Isn't some operations where at the end we directly call heap_sync
> > without writing WAL will have a similar problem as well?
>
> Maybe.  Can you give an example?

Looking through the code, I found two cases where we do this.  One is
a bulk insert operation with wal_level = minimal, and the other is
CLUSTER or VACUUM FULL with wal_level = minimal. In both of these
cases we are generating new blocks whose LSNs will be 0/0. So, I think
we need a rule that if the server is asked to back up all blocks in a
file with LSNs > some threshold LSN, it must also include any blocks
whose LSN is 0/0. Those blocks are either uninitialized or are
populated without WAL logging, so they always need to be copied.

Outside of unlogged and temporary tables, I don't know of any case
where make a critical modification to an already-existing block
without bumping the LSN. I hope there is no such case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-16 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Sep 16, 2019 at 4:31 AM Amit Kapila  wrote:
> > Can we think of using creation time for file?  Basically, if the file
> > creation time is later than backup-labels "START TIME:", then include
> > that file entirely.  I think one big point against this is clock skew
> > like what if somebody tinkers with the clock.  And also, this can
> > cover cases like
> > what Jeevan has pointed but might not cover other cases which we found
> > problematic.
> 
> Well that would mean, for example, that if you copied the data
> directory from one machine to another, the next "incremental" backup
> would turn into a full backup. That sucks. And in other situations,
> like resetting the clock, it could mean that you end up with a corrupt
> backup without any real ability for PostgreSQL to detect it. I'm not
> saying that it is impossible to create a practically useful system
> based on file time stamps, but I really don't like it.

In a number of cases, trying to make sure that on a failover or copy of
the backup the next 'incremental' is really an 'incremental' is
dangerous.  A better strategy to address this, and the other issues
realized on this thread recently, is to:

- Have a manifest of every file in each backup
- Always back up new files that weren't in the prior backup
- Keep a checksum of each file
- Track the timestamp of each file as of when it was backed up
- Track the file size of each file
- Track the starting timestamp of each backup
- Always include files with a modification time after the starting
  timestamp of the prior backup, or if the file size has changed
- In the event of any anomolies (which includes things like a timeline
  switch), use checksum matching (aka 'delta checksum backup') to
  perform the backup instead of using timestamps (or just always do that
  if you want to be particularly careful- having an option for it is
  great)
- Probably other things I'm not thinking of off-hand, but this is at
  least a good start.  Make sure to checksum this information too.

I agree entirely that it is dangerous to simply rely on creation time as
compared to some other time, or to rely on modification time of a given
file across multiple backups (which has been shown to reliably cause
corruption, at least with rsync and its 1-second granularity on
modification time).

By having a manifest for each backed up file for each backup, you also
gain the ability to validate that a backup in the repository hasn't been
corrupted post-backup, a feature that at least some other database
backup and restore systems have (referring specifically to the big O in
this particular case, but I bet others do too).

Having a system of keeping track of which backups are full and which are
differential in an overall system also gives you the ability to do
things like expiration in a sensible way, including handling WAL
expiration.

As also mentioned up-thread, this likely also allows you to have a
simpler approach to parallelizing the overall backup.

I'd like to clarify that while I would like to have an easier way to
parallelize backups, that's a relatively minor complaint- the much
bigger issue that I have with this feature is that trying to address
everything correctly while having only the amount of information that
could be passed on the command-line about the prior full/incremental is
going to be extremely difficult, complicated, and likely to lead to
subtle bugs in the actual code, and probably less than subtle bugs in
how users end up using it, since they'll have to implement the
expiration and tracking of information between backups themselves
(unless something's changed in that part during this discussion- I admit
that I've not read every email in this thread).

> > One related point is how do incremental backups handle the case where
> > vacuum truncates the relation partially?  Basically, with current
> > patch/design, it doesn't appear that such information can be passed
> > via incremental backup.  I am not sure if this is a problem, but it
> > would be good if we can somehow handle this.
> 
> As to this, if you're taking a full backup of a particular file,
> there's no problem.  If you're taking a partial backup of a particular
> file, you need to include the current length of the file and the
> identity and contents of each modified block.  Then you're fine.

I would also expect this to be fine but if there's an example of where
this is an issue, please share.  The only issue that I can think of
off-hand is orphaned-file risk, whereby you have something like CREATE
DATABASE or perhaps ALTER TABLE .. SET TABLESPACE or such, take a
backup while that's happening, but that doesn't complete during the
backup (or recovery, or perhaps even in some other scenarios, it's
unfortunately quite complicated).  This orphaned file risk isn't newly
discovered but fixing it is pretty complicated- would love to discuss
ideas around how to handle it.

> > Isn't so

Re: block-level incremental backup

2019-09-16 Thread Robert Haas

On Mon, Sep 16, 2019 at 4:31 AM Amit Kapila  wrote:
> This seems to be a blocking problem for the LSN based design.

Well, only the simplest version of it, I think.

> Can we think of using creation time for file?  Basically, if the file
> creation time is later than backup-labels "START TIME:", then include
> that file entirely.  I think one big point against this is clock skew
> like what if somebody tinkers with the clock.  And also, this can
> cover cases like
> what Jeevan has pointed but might not cover other cases which we found
> problematic.

Well that would mean, for example, that if you copied the data
directory from one machine to another, the next "incremental" backup
would turn into a full backup. That sucks. And in other situations,
like resetting the clock, it could mean that you end up with a corrupt
backup without any real ability for PostgreSQL to detect it. I'm not
saying that it is impossible to create a practically useful system
based on file time stamps, but I really don't like it.

> I think the operations covered by WAL flag XLR_SPECIAL_REL_UPDATE will
> have similar problems.

I'm not sure quite what you mean by that.  Can you elaborate? It
appears to me that the XLR_SPECIAL_REL_UPDATE operations are all
things that create files, remove files, or truncate files, and the
sketch in my previous email would handle the first two of those cases
correctly.  See below for the third.

> One related point is how do incremental backups handle the case where
> vacuum truncates the relation partially?  Basically, with current
> patch/design, it doesn't appear that such information can be passed
> via incremental backup.  I am not sure if this is a problem, but it
> would be good if we can somehow handle this.

As to this, if you're taking a full backup of a particular file,
there's no problem.  If you're taking a partial backup of a particular
file, you need to include the current length of the file and the
identity and contents of each modified block.  Then you're fine.

> Isn't some operations where at the end we directly call heap_sync
> without writing WAL will have a similar problem as well?

Maybe.  Can you give an example?

> Similarly,
> it is not very clear if unlogged relations are handled in some way if
> not, the same could be documented.

I think that we don't need to back up the contents of unlogged
relations at all, right? Restoration from an online backup always
involves running recovery, and so unlogged relations will anyway get
zapped.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-16 Thread Amit Kapila

On Mon, Sep 16, 2019 at 7:22 AM Robert Haas  wrote:
>
> On Thu, Sep 12, 2019 at 9:13 AM Jeevan Chalke
>  wrote:
> > I had a look over this issue and observed that when the new database is
> > created, the catalog files are copied as-is into the new directory
> > corresponding to a newly created database. And as they are just copied,
> > the LSN on those pages are not changed. Due to this incremental backup
> > thinks that its an existing file and thus do not copy the blocks from
> > these new files, leading to the failure.
>
> *facepalm*
>
> Well, this shoots a pretty big hole in my design for this feature. I
> don't know why I didn't think of this when I wrote out that design
> originally. Ugh.
>
> Unless we change the way that CREATE DATABASE and any similar
> operations work so that they always stamp pages with new LSNs, I think
> we have to give up on the idea of being able to take an incremental
> backup by just specifying an LSN.
>

This seems to be a blocking problem for the LSN based design.  Can we
think of using creation time for file?  Basically, if the file
creation time is later than backup-labels "START TIME:", then include
that file entirely.  I think one big point against this is clock skew
like what if somebody tinkers with the clock.  And also, this can
cover cases like
what Jeevan has pointed but might not cover other cases which we found
problematic.

>  We'll instead need to get a list of
> files from the server first, and then request the entirety of any that
> we don't have, plus the changed blocks from the ones that we do have.
> I guess that will make Stephen happy, since it's more like the design
> he wanted originally (and should generalize more simply to parallel
> backup).
>
> One question I have is: is there any scenario in which an existing
> page gets modified after the full backup and before the incremental
> backup but does not end up with an LSN that follows the full backup's
> start LSN?
>

I think the operations covered by WAL flag XLR_SPECIAL_REL_UPDATE will
have similar problems.

One related point is how do incremental backups handle the case where
vacuum truncates the relation partially?  Basically, with current
patch/design, it doesn't appear that such information can be passed
via incremental backup.  I am not sure if this is a problem, but it
would be good if we can somehow handle this.

Isn't some operations where at the end we directly call heap_sync
without writing WAL will have a similar problem as well?  Similarly,
it is not very clear if unlogged relations are handled in some way if
not, the same could be documented.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: block-level incremental backup

2019-09-15 Thread Robert Haas

On Thu, Sep 12, 2019 at 9:13 AM Jeevan Chalke
 wrote:
> I had a look over this issue and observed that when the new database is
> created, the catalog files are copied as-is into the new directory
> corresponding to a newly created database. And as they are just copied,
> the LSN on those pages are not changed. Due to this incremental backup
> thinks that its an existing file and thus do not copy the blocks from
> these new files, leading to the failure.

*facepalm*

Well, this shoots a pretty big hole in my design for this feature. I
don't know why I didn't think of this when I wrote out that design
originally. Ugh.

Unless we change the way that CREATE DATABASE and any similar
operations work so that they always stamp pages with new LSNs, I think
we have to give up on the idea of being able to take an incremental
backup by just specifying an LSN. We'll instead need to get a list of
files from the server first, and then request the entirety of any that
we don't have, plus the changed blocks from the ones that we do have.
I guess that will make Stephen happy, since it's more like the design
he wanted originally (and should generalize more simply to parallel
backup).

One question I have is: is there any scenario in which an existing
page gets modified after the full backup and before the incremental
backup but does not end up with an LSN that follows the full backup's
start LSN? If there is, then the whole concept of using LSNs to tell
which blocks have been modified doesn't really work. I can't think of
a way that can happen off-hand, but then, I thought my last design was
good, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-15 Thread Robert Haas

On Fri, Sep 13, 2019 at 1:08 PM vignesh C  wrote:
> Instead of reading the whole file here, we can read the file page by page. 
> There is a possibility of data inconsistency if data is not read page by 
> page, data will be consistent if read page by page as full page write will be 
> enabled at this time.

I think you are confused about what "full page writes" means. It has
to do what gets written to the write-ahead log, not the way that the
pages themselves are written. There is no portable way to ensure that
an 8kB read or write is atomic, and generally it isn't.

It shouldn't matter whether the file is read all at once, page by
page, or byte by byte, except for performance. Recovery is going to
run when that backup is restored, and any inconsistencies should get
fixed up at that time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-13 Thread vignesh C

On Mon, Sep 9, 2019 at 4:51 PM Jeevan Chalke 
wrote:
>
>
>
> On Tue, Aug 27, 2019 at 4:46 PM vignesh C  wrote:
>>
>> Few comments:
>> Comment:
>> + buf = (char *) malloc(statbuf->st_size);
>> + if (buf == NULL)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_OUT_OF_MEMORY),
>> + errmsg("out of memory")));
>> +
>> + if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0)
>> + {
>> + Bitmapset  *mod_blocks = NULL;
>> + int nmodblocks = 0;
>> +
>> + if (cnt % BLCKSZ != 0)
>> + {
>>
>> We can use same size as full page size.
>> After pg start backup full page write will be enabled.
>> We can use the same file size to maintain data consistency.
>
>
> Can you please explain which size?
> The aim here is to read entire file in-memory and thus used
statbuf->st_size.
>
Instead of reading the whole file here, we can read the file page by page.
There is a possibility of data inconsistency if data is not read page by
page, data will be consistent if read page by page as full page write will
be enabled at this time.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: block-level incremental backup

2019-09-12 Thread Jeevan Chalke

Hi,

One of my colleague at EDB, Rajkumar Raghuwanshi, while testing this
feature reported an issue. He reported that if a full base-backup is
taken, and then created a database, and then took an incremental backup,
combining full backup with incremental backup is then failing.

I had a look over this issue and observed that when the new database is
created, the catalog files are copied as-is into the new directory
corresponding to a newly created database. And as they are just copied,
the LSN on those pages are not changed. Due to this incremental backup
thinks that its an existing file and thus do not copy the blocks from
these new files, leading to the failure.

I have surprised to know that even though we are creating new files from
old files, we kept the LSN unmodified. I didn't see any other parameter
in basebackup which tells that this is a new file from last LSN or
something.

I tried looking for any other DDL doing similar stuff like creating a new
page with existing LSN. But I could not find any other commands than
CREATE DATABASE and ALTER DATABASE .. SET TABLESPACE.

Suggestions/thoughts?

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-09 Thread Jeevan Chalke

On Fri, Aug 30, 2019 at 6:52 PM Jeevan Ladhe 
wrote:

> Here are some comments:
> Or maybe we can just say:
> "cannot verify checksum in file \"%s\"" if checksum requested, disable the
> checksum and leave it to the following message:
>
> +   ereport(WARNING,
> +   (errmsg("file size (%d) not in multiple of page size
> (%d), sending whole file",
> +   (int) cnt, BLCKSZ)));
>
>
Opted for the above suggestion.


>
> I think we should give the user hint from where he should be reading the
> input
> lsn for incremental backup in the --help option as well as documentation?
> Something like - "To take an incremental backup, please provide value of
> "--lsn"
> as the "START WAL LOCATION" of previously taken full backup or incremental
> backup from backup_lable file.
>

Added this in the documentation. In help, it will be too crowdy.


> pg_combinebackup:
>
> +static bool made_new_outputdata = false;
> +static bool found_existing_outputdata = false;
>
> Both of these are global, I understand that we need them global so that
> they are
> accessible in cleanup_directories_atexit(). But they are passed to
> verify_dir_is_empty_or_create() as parameters, which I think is not needed.
> Instead verify_dir_is_empty_or_create() can directly change the globals.
>

After adding support for a tablespace, these two functions take different
values depending upon the context.


> The current logic assumes the incremental backup directories are to be
> provided
> as input in the serial order the backups were taken. This is bit confusing
> unless clarified in pg_combinebackup help menu or documentation. I think we
> should clarify it at both the places.
>

Added in doc.


>
> I think scan_directory() should be rather renamed as do_combinebackup().
>

I am not sure about this renaming. scan_directory() is called recursively
to scan each sub-directories too. If we rename it then it is not actually
recursively doing a combinebackup. Combine backup is a single whole
process.

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-09 Thread Jeevan Chalke

On Tue, Sep 3, 2019 at 12:11 PM Dilip Kumar  wrote:

> On Fri, Aug 16, 2019 at 3:54 PM Jeevan Chalke
>  wrote:
> >
> 0003:
> +/*
> + * When to send the whole file, % blocks modified (90%)
> + */
> +#define WHOLE_FILE_THRESHOLD 0.9
>
> How this threshold is selected.  Is it by some test?
>

Currently, it is set arbitrarily. If required, we will make it a GUC.


>
> - magic number, currently 0 (4 bytes)
> I think in the patch we are using  (#define INCREMENTAL_BACKUP_MAGIC
> 0x494E4352) as a magic number, not 0
>

Yes. Robert too reported this. Updated the commit message.


>
> Can we breakdown this function in 2-3 functions.  At least creating a
> file map can directly go to a separate function.
>

Separated out filemap changes to separate function. Rest kept as is to have
an easy followup.


>
> I have read 0003 and 0004 patch and there are few cosmetic comments.
>

Can you please post those too?

Other comments are fixed.


>
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>

Thanks
-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-09 Thread Jeevan Chalke

On Wed, Sep 4, 2019 at 5:21 PM Dilip Kumar  wrote:

>
>  I have not yet completed the review for 0004, but I have few more
> comments.  Tomorrow I will try to complete the review and some testing
> as well.
>
> 1. It seems that the output full backup generated with
> pg_combinebackup also contains the "INCREMENTAL BACKUP REFERENCE WAL
> LOCATION".  It seems confusing
> because now this is a full backup, not the incremental backup.
>

Yes, that was remaining and was in my TODO.
Done in the new patchset. Also, taking --label as an input like
pg_basebackup.


>
> 2.
> + memset(outblocks, 0, sizeof(FileOffset) * RELSEG_SIZE);
>
> I don't think you need to memset this explicitly as you can initialize
> the array itself no?
> FileOffset outblocks[RELSEG_SIZE] = {{0}}
>

I didn't see any issue with memset either but changed this per your
suggestion.


>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>


-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-09 Thread Jeevan Chalke

On Tue, Aug 27, 2019 at 11:59 PM Robert Haas  wrote:

> On Fri, Aug 16, 2019 at 6:23 AM Jeevan Chalke
>  wrote:
> > [ patches ]
>
> Reviewing 0002 and 0003:
>
> - Commit message for 0003 claims magic number and checksum are 0, but
> that (fortunately) doesn't seem to be the case.
>

Oops, updated commit message.


>
> - looks_like_rel_name actually checks whether it looks like a
> *non-temporary* relation name; suggest adjusting the function name.
>
> - The names do_full_backup and do_incremental_backup are quite
> confusing because you're really talking about what to do with one
> file.  I suggest sendCompleteFile() and sendPartialFile().
>

Changed function names.


>
> - Is there any good reason to have 'refptr' as a global variable, or
> could we just pass the LSN around via function arguments?  I know it's
> just mimicking startptr, but storing startptr in a global variable
> doesn't seem like a great idea either, so if it's not too annoying,
> let's pass it down via function arguments instead.  Also, refptr is a
> crappy name (even worse than startptr); whether we end up with a
> global variable or a bunch of local variables, let's make the name(s)
> clear and unambiguous, like incremental_reference_lsn.  Yeah, I know
> that's long, but I still think it's better than being unclear.
>

Renamed variable.
However, I have kept that as global only as it needs many functions to
change their signature, like, sendFile(), sendDir(), sendTablspeace() etc.


> - do_incremental_backup looks like it can never report an error from
> fread(), which is bad.  But I see that this is just copied from the
> existing code which has the same problem, so I started a separate
> thread about that.
>
> - I think that passing cnt and blkindex to verify_page_checksum()
> doesn't look very good from an abstraction point of view.  Granted,
> the existing code isn't great either, but I think this makes the
> problem worse.  I suggest passing "int backup_distance" to this
> function, computed as cnt - BLCKSZ * blkindex.  Then, you can
> fseek(-backup_distance), fread(BLCKSZ), and then fseek(backup_distance
> - BLCKSZ).
>

Yep. Done these changes in the refactoring patch.


>
> - While I generally support the use of while and for loops rather than
> goto for flow control, a while (1) loop that ends with a break is
> functionally a goto anyway.  I think there are several ways this could
> be revised.  The most obvious one is probably to use goto, but I vote
> for inverting the sense of the test: if (PageIsNew(page) ||
> PageGetLSN(page) >= startptr) break; This approach also saves a level
> of indentation for more than half of the function.
>

I have used this new inverted condition, but we still need a while(1) loop.


> - I am not sure that it's a good idea for sendwholefile = true to
> result in dumping the entire file onto the wire in a single CopyData
> message.  I don't know of a concrete problem in typical
> configurations, but someone who increases RELSEG_SIZE might be able to
> overflow CopyData's length word.  At 2GB the length word would be
> negative, which might break, and at 4GB it would wrap around, which
> would certainly break.  See CopyData in
> https://www.postgresql.org/docs/12/protocol-message-formats.html  To
> avoid this issue, and maybe some others, I suggest defining a
> reasonably large chunk size, say 1MB as a constant in this file
> someplace, and sending the data as a series of chunks of that size.
>

OK. Done as per the suggestions.


>
> - I don't think that the way concurrent truncation is handled is
> correct for partial files.  Right now it just falls through to code
> which appends blocks of zeroes in either the complete-file or
> partial-file case.  I think that logic should be moved into the
> function that handles the complete-file case.  In the partial-file
> case, the blocks that we actually send need to match the list of block
> numbers we promised to send.  We can't just send the promised blocks
> and then tack a bunch of zero-filled blocks onto the end that the file
> header doesn't know about.
>

Well, in partial file case we won't end up inside that block. So we are
never sending zeroes at the end in case of partial file.


> - For reviewer convenience, please use the -v option to git
> format-patch when posting and reposting a patch series.  Using -v2,
> -v3, etc. on successive versions really helps.
>

Sure. Thanks for letting me know about this option.


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Thanks
-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-09 Thread Jeevan Chalke

On Tue, Aug 27, 2019 at 4:46 PM vignesh C  wrote:

> Few comments:
> Comment:
> + buf = (char *) malloc(statbuf->st_size);
> + if (buf == NULL)
> + ereport(ERROR,
> + (errcode(ERRCODE_OUT_OF_MEMORY),
> + errmsg("out of memory")));
> +
> + if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0)
> + {
> + Bitmapset  *mod_blocks = NULL;
> + int nmodblocks = 0;
> +
> + if (cnt % BLCKSZ != 0)
> + {
>
> We can use same size as full page size.
> After pg start backup full page write will be enabled.
> We can use the same file size to maintain data consistency.
>

Can you please explain which size?
The aim here is to read entire file in-memory and thus used
statbuf->st_size.

Comment:
> Should we check if it is same timeline as the system's timeline.
>

At the time of taking the incremental backup, we can't check that.
However, while combining, I made sure that the timeline is the same for all
backups.


>
> Comment:
>
> Should we support compression formats supported by pg_basebackup.
> This can be an enhancement after the functionality is completed.
>

For the incremental backup, it just works out of the box.
For combining backup, as discussed up-thread, the user has to
uncompress first, combine them, compress if required.


> Comment:
> We should provide some mechanism to validate the backup. To identify
> if some backup is corrupt or some file is missing(deleted) in a
> backup.
>

Maybe, but not for the first version.


> Comment:
> +/*
> + * When to send the whole file, % blocks modified (90%)
> + */
> +#define WHOLE_FILE_THRESHOLD 0.9
> +
> This can be user configured value.
> This can be an enhancement after the functionality is completed.
>

Yes.


> Comment:
> We can add a readme file with all the details regarding incremental
> backup and combine backup.
>

Will have a look.


>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
>

Thanks
-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-09 Thread Jeevan Chalke

Hi,

Attached new set of patches adding support for the tablespace handling.

This patchset also fixes the issues reported by Vignesh, Robert, Jeevan
Ladhe,
and Dilip Kumar.

Please have a look and let me know if I  missed any comments to account.

Thanks
-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
From 7cc2fb70188676406ca883055467aff602438fcd Mon Sep 17 00:00:00 2001
From: Jeevan Chalke 
Date: Mon, 9 Sep 2019 10:38:01 +0530
Subject: [PATCH v2 1/4] Add support for command line option to pass LSN.

This adds [ LSN 'lsn' ] to BASE_BACKUP command and --lsn=LSN to
the pg_basebackup binary.

Also, add small tests.
---
 src/backend/replication/basebackup.c | 20 
 src/backend/replication/repl_gram.y  |  6 ++
 src/backend/replication/repl_scanner.l   |  1 +
 src/bin/pg_basebackup/pg_basebackup.c| 15 +--
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 12 +++-
 5 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 6aab8d7..e72bf8e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -38,6 +38,7 @@
 #include "storage/ipc.h"
 #include "storage/reinit.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/relcache.h"
 #include "utils/timestamp.h"
@@ -52,6 +53,7 @@ typedef struct
 	bool		includewal;
 	uint32		maxrate;
 	bool		sendtblspcmapfile;
+	XLogRecPtr 	lsn;
 } basebackup_options;
 
 
@@ -652,6 +654,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 	bool		o_maxrate = false;
 	bool		o_tablespace_map = false;
 	bool		o_noverify_checksums = false;
+	bool		o_lsn = false;
 
 	MemSet(opt, 0, sizeof(*opt));
 	foreach(lopt, options)
@@ -740,6 +743,23 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 			noverify_checksums = true;
 			o_noverify_checksums = true;
 		}
+		else if (strcmp(defel->defname, "lsn") == 0)
+		{
+			bool		have_error = false;
+
+			if (o_lsn)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("duplicate option \"%s\"", defel->defname)));
+			o_lsn = true;
+
+			/* Validate given LSN and convert it into XLogRecPtr. */
+			opt->lsn = pg_lsn_in_internal(strVal(defel->arg), &have_error);
+			if (XLogRecPtrIsInvalid(opt->lsn))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+		 errmsg("invalid value for LSN")));
+		}
 		else
 			elog(ERROR, "option \"%s\" not recognized",
  defel->defname);
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc..c24d319 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,6 +87,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_EXPORT_SNAPSHOT
 %token K_NOEXPORT_SNAPSHOT
 %token K_USE_SNAPSHOT
+%token K_LSN
 
 %type 	command
 %type 	base_backup start_replication start_logical_replication
@@ -214,6 +215,11 @@ base_backup_opt:
   $$ = makeDefElem("noverify_checksums",
    (Node *)makeInteger(true), -1);
 }
+			| K_LSN SCONST
+{
+  $$ = makeDefElem("lsn",
+   (Node *)makeString($2), -1);
+}
 			;
 
 create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 380faeb..77b5af4 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -107,6 +107,7 @@ EXPORT_SNAPSHOT		{ return K_EXPORT_SNAPSHOT; }
 NOEXPORT_SNAPSHOT	{ return K_NOEXPORT_SNAPSHOT; }
 USE_SNAPSHOT		{ return K_USE_SNAPSHOT; }
 WAIT{ return K_WAIT; }
+LSN	{ return K_LSN; }
 
 ","{ return ','; }
 ";"{ return ';'; }
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 7986872..1791853 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -105,6 +105,7 @@ static bool temp_replication_slot = true;
 static bool create_slot = false;
 static bool no_slot = false;
 static bool verify_checksums = true;
+static char *lsn = NULL;
 
 static bool success = false;
 static bool made_new_pgdata = false;
@@ -341,6 +342,7 @@ usage(void)
 			 " include required WAL files with specified method\n"));
 	printf(_("  -z, --gzip compress tar output\n"));
 	printf(_("  -Z, --compress=0-9 compress tar output with given compression level\n"));
+	printf(_("  --lsn=LSN  incremental backup, using LSN as threshold\n"));
 	printf(_("\nGeneral options:\n"));
 	printf(_("  -c, --checkpoint=fast|spread\n"
 			 " set fast or spread checkpointing\n"));
@@ -1805,6 +1807,7 @@ BaseBackup(void)
 maxServerMajor;
 	int			serverVersion,
 serverMajor;
+	char	   *lsn_clause = NULL;
 
 	Assert(conn != NULL);
 
@@ -1871,8 +1874,11 @@

Re: block-level incremental backup

2019-09-04 Thread Robert Haas

On Wed, Sep 4, 2019 at 10:08 PM Michael Paquier  wrote:
> > For generating a
> > file, you can always emit the newest and "best" tar format, but for
> > reading a file, you probably want to be prepared for older or cruftier
> > variants.  Maybe not -- I'm not super-familiar with the tar on-disk
> > format.  But I think there must be a reason why tar libraries exist,
> > and I don't want to write a new one.
>
> We need to be sure as well that the library chosen does not block
> access to a feature in all the various platforms we have.

Well, again, my preference is to just not make this particular feature
work natively with tar files.  Then I don't need to choose a library,
so the question is moot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-04 Thread Michael Paquier

On Tue, Sep 03, 2019 at 08:59:53AM -0400, Robert Haas wrote:
> I think pg_basebackup is using homebrew code to generate tar files,
> but I'm reluctant to do that for reading tar files.

Yes.  This code has not actually changed since its introduction.
Please note that we also have code which reads directly data from a
tarball in pg_basebackup.c when appending the recovery parameters to
postgresql.auto.conf for -R.  There could be some consolidation here
with what you are doing.

> For generating a
> file, you can always emit the newest and "best" tar format, but for
> reading a file, you probably want to be prepared for older or cruftier
> variants.  Maybe not -- I'm not super-familiar with the tar on-disk
> format.  But I think there must be a reason why tar libraries exist,
> and I don't want to write a new one.

We need to be sure as well that the library chosen does not block
access to a feature in all the various platforms we have.
--
Michael


signature.asc
Description: PGP signature

Re: block-level incremental backup

2019-09-04 Thread Robert Haas

On Tue, Sep 3, 2019 at 12:46 PM Ibrar Ahmed  wrote:
> I did that and have experience working on the TAR format.  I was curious 
> about what
> "best/newest" you are talking.

Well, why not go look it up?

On my MacBook, tar is documented to understand three different tar
formats: gnutar, ustar, and v7, and two sets of extensions to the tar
format: numeric extensions required by POSIX, and Solaris extensions.
It also understands the pax and restricted-pax formats which are
derived from the ustar format.  I don't know what your system
supports, but it's probably not hugely different; the fact that there
are multiple tar formats has been documented in the tar man page on
every machine I've checked for the past 20 years.  Here, 'man tar'
refers the reader to 'man libarchive-formats', which contains the
details mentioned above.

A quick Google search for 'multiple tar formats' also finds
https://en.wikipedia.org/wiki/Tar_(computing)#File_format and
https://www.gnu.org/software/tar/manual/html_chapter/tar_8.html each
of which explains a good deal of the complexity in this area.

I don't really understand why I have to explain to you what I mean
when I say there are multiple tar formats when you can look it up on
Google and find that there are multiple tar formats.  Again, the point
is that the current code only generates tar archives and therefore
only needs to generate one format, but if we add code that reads a tar
archive, it probably needs to read several formats, because there are
several formats that are popular enough to be widely-supported.

It's possible that somebody else here knows more about this topic and
could make better judgements than I can, but my view at present is
that if we want to read tar archives, we probably would want to do it
by depending on libarchive.  And I don't think we should do that for
this project because I don't think it would provide much value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-04 Thread Dilip Kumar

On Tue, Sep 3, 2019 at 12:11 PM Dilip Kumar  wrote:
>
> On Fri, Aug 16, 2019 at 3:54 PM Jeevan Chalke
>  wrote:
> >
> 0003:
> +/*
> + * When to send the whole file, % blocks modified (90%)
> + */
> +#define WHOLE_FILE_THRESHOLD 0.9
>
> How this threshold is selected.  Is it by some test?
>
>
> - magic number, currently 0 (4 bytes)
> I think in the patch we are using  (#define INCREMENTAL_BACKUP_MAGIC
> 0x494E4352) as a magic number, not 0
>
>
> + Assert(statbuf->st_size <= (RELSEG_SIZE * BLCKSZ));
> +
> + buf = (char *) malloc(statbuf->st_size);
> + if (buf == NULL)
> + ereport(ERROR,
> + (errcode(ERRCODE_OUT_OF_MEMORY),
> + errmsg("out of memory")));
> +
> + if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0)
> + {
> + Bitmapset  *mod_blocks = NULL;
> + int nmodblocks = 0;
> +
> + if (cnt % BLCKSZ != 0)
> + {
>
> It will be good to add some comments for the if block and also for the
> assert. Actully, it's not very clear from the code.
>
> 0004:
> +#include 
> +#include 
> +#include 
> Header file include order (sys/state.h should be before time.h)
>
>
>
> + printf(_("%s combines full backup with incremental backup.\n\n"), progname);
> /backup/backups
>
>
> + * scan_file
> + *
> + * Checks whether given file is partial file or not.  If partial, then 
> combines
> + * it into a full backup file, else copies as is to the output directory.
> + */
>
> /If partial, then combines/ If partial, then combine
>
>
>
> +static void
> +combine_partial_files(const char *fn, char **IncrDirs, int nIncrDir,
> +   const char *subdirpath, const char *outfn)
> + /*
> + * Open all files from all incremental backup directories and create a file
> + * map.
> + */
> + basefilefound = false;
> + for (i = (nIncrDir - 1), fmindex = 0; i >= 0; i--, fmindex++)
> + {
> + fm = &filemaps[fmindex];
> +
> .
> + }
> +
> +
> + /* Process all opened files. */
> + lastblkno = 0;
> + modifiedblockfound = false;
> + for (i = 0; i < fmindex; i++)
> + {
> + char*buf;
> + int hsize;
> + int k;
> + int blkstartoffset;
> ..
> + }
> +
> + for (i = 0; i <= lastblkno; i++)
> + {
> + char blkdata[BLCKSZ];
> + FILE*infp;
> + int offset;
> ...
> + }
> }
>
> Can we breakdown this function in 2-3 functions.  At least creating a
> file map can directly go to a separate function.
>
> I have read 0003 and 0004 patch and there are few cosmetic comments.
>
 I have not yet completed the review for 0004, but I have few more
comments.  Tomorrow I will try to complete the review and some testing
as well.

1. It seems that the output full backup generated with
pg_combinebackup also contains the "INCREMENTAL BACKUP REFERENCE WAL
LOCATION".  It seems confusing
because now this is a full backup, not the incremental backup.

2.
+ FILE*outfp;
+ FileOffset outblocks[RELSEG_SIZE];
+ int i;
+ FileMap*filemaps;
+ int fmindex;
+ bool basefilefound;
+ bool modifiedblockfound;
+ uint32 lastblkno;
+ FileMap*fm;
+ struct stat statbuf;
+ uint32 nblocks;
+
+ memset(outblocks, 0, sizeof(FileOffset) * RELSEG_SIZE);

I don't think you need to memset this explicitly as you can initialize
the array itself no?
FileOffset outblocks[RELSEG_SIZE] = {{0}}

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: block-level incremental backup

2019-09-03 Thread Ibrar Ahmed

On Tue, Sep 3, 2019 at 7:39 PM Robert Haas  wrote:

> On Tue, Sep 3, 2019 at 10:05 AM Ibrar Ahmed  wrote:
> > +1 using the library to tar. But I think reason not using tar library is
> TAR is
> > one of the most simple file format. What is the best/newest format of
> TAR?
>
> So, I don't really want to go down this path at all, as I already
> said.  You can certainly do your own research on this topic if you
> wish.
>
> I did that and have experience working on the TAR format.  I was curious
about what
"best/newest" you are talking.



> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


-- 
Ibrar Ahmed

Re: block-level incremental backup

2019-09-03 Thread Ibrar Ahmed

On Tue, Sep 3, 2019 at 8:00 PM Tom Lane  wrote:

> Ibrar Ahmed  writes:
> > +1 using the library to tar.
>
> Uh, *what* library?
>

I was just replying the Robert that he said

"But I think there must be a reason why tar libraries exist,
and I don't want to write a new one."

I said I am ok to use a library "what he is proposing/thinking",
but explained to him that TAR is the most simpler format that
why PG has its own code.


> pg_dump's pg_backup_tar.c is about 1300 lines, a very large fraction
> of which is boilerplate for interfacing to pg_backup_archiver's APIs.
> The stuff that actually knows specifically about tar looks to be maybe
> a couple hundred lines, plus there's another couple hundred lines of
> (rather duplicative?) code in src/port/tar.c.  None of it is rocket
> science.
>
> I can't believe that it'd be a good tradeoff to create a new external
> dependency to replace that amount of code.  In case you haven't noticed,
> our luck with depending on external libraries has been abysmal.
>
> Possibly there's an argument for refactoring things so that there's
> more stuff in tar.c and less elsewhere, but let's not go looking
> for external code to depend on.
>
> regards, tom lane
>


-- 
Ibrar Ahmed

Re: block-level incremental backup

2019-09-03 Thread Tom Lane

Ibrar Ahmed  writes:
> +1 using the library to tar.

Uh, *what* library?

pg_dump's pg_backup_tar.c is about 1300 lines, a very large fraction
of which is boilerplate for interfacing to pg_backup_archiver's APIs.
The stuff that actually knows specifically about tar looks to be maybe
a couple hundred lines, plus there's another couple hundred lines of
(rather duplicative?) code in src/port/tar.c.  None of it is rocket
science.

I can't believe that it'd be a good tradeoff to create a new external
dependency to replace that amount of code.  In case you haven't noticed,
our luck with depending on external libraries has been abysmal.

Possibly there's an argument for refactoring things so that there's
more stuff in tar.c and less elsewhere, but let's not go looking
for external code to depend on.

regards, tom lane

Re: block-level incremental backup

2019-09-03 Thread Robert Haas

On Tue, Sep 3, 2019 at 10:05 AM Ibrar Ahmed  wrote:
> +1 using the library to tar. But I think reason not using tar library is TAR 
> is
> one of the most simple file format. What is the best/newest format of TAR?

So, I don't really want to go down this path at all, as I already
said.  You can certainly do your own research on this topic if you
wish.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-03 Thread Ibrar Ahmed

On Tue, Sep 3, 2019 at 6:00 PM Robert Haas  wrote:

> On Sat, Aug 31, 2019 at 3:41 PM Ibrar Ahmed  wrote:
> > Are we using any tar library in pg_basebackup.c? We already have the
> capability
> > in pg_basebackup to do that.
>
> I think pg_basebackup is using homebrew code to generate tar files,
> but I'm reluctant to do that for reading tar files.  For generating a
> file, you can always emit the newest and "best" tar format, but for
> reading a file, you probably want to be prepared for older or cruftier
> variants.  Maybe not -- I'm not super-familiar with the tar on-disk
> format.  But I think there must be a reason why tar libraries exist,
> and I don't want to write a new one.
>
+1 using the library to tar. But I think reason not using tar library is
TAR is
one of the most simple file format. What is the best/newest format of TAR?

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


-- 
Ibrar Ahmed

Re: block-level incremental backup

2019-09-03 Thread Robert Haas

On Sat, Aug 31, 2019 at 3:41 PM Ibrar Ahmed  wrote:
> Are we using any tar library in pg_basebackup.c? We already have the 
> capability
> in pg_basebackup to do that.

I think pg_basebackup is using homebrew code to generate tar files,
but I'm reluctant to do that for reading tar files.  For generating a
file, you can always emit the newest and "best" tar format, but for
reading a file, you probably want to be prepared for older or cruftier
variants.  Maybe not -- I'm not super-familiar with the tar on-disk
format.  But I think there must be a reason why tar libraries exist,
and I don't want to write a new one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-09-02 Thread Dilip Kumar

On Fri, Aug 16, 2019 at 3:54 PM Jeevan Chalke
 wrote:
>
0003:
+/*
+ * When to send the whole file, % blocks modified (90%)
+ */
+#define WHOLE_FILE_THRESHOLD 0.9

How this threshold is selected.  Is it by some test?


- magic number, currently 0 (4 bytes)
I think in the patch we are using  (#define INCREMENTAL_BACKUP_MAGIC
0x494E4352) as a magic number, not 0


+ Assert(statbuf->st_size <= (RELSEG_SIZE * BLCKSZ));
+
+ buf = (char *) malloc(statbuf->st_size);
+ if (buf == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory")));
+
+ if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0)
+ {
+ Bitmapset  *mod_blocks = NULL;
+ int nmodblocks = 0;
+
+ if (cnt % BLCKSZ != 0)
+ {

It will be good to add some comments for the if block and also for the
assert. Actully, it's not very clear from the code.

0004:
+#include 
+#include 
+#include 
Header file include order (sys/state.h should be before time.h)



+ printf(_("%s combines full backup with incremental backup.\n\n"), progname);
/backup/backups


+ * scan_file
+ *
+ * Checks whether given file is partial file or not.  If partial, then combines
+ * it into a full backup file, else copies as is to the output directory.
+ */

/If partial, then combines/ If partial, then combine



+static void
+combine_partial_files(const char *fn, char **IncrDirs, int nIncrDir,
+   const char *subdirpath, const char *outfn)
+ /*
+ * Open all files from all incremental backup directories and create a file
+ * map.
+ */
+ basefilefound = false;
+ for (i = (nIncrDir - 1), fmindex = 0; i >= 0; i--, fmindex++)
+ {
+ fm = &filemaps[fmindex];
+
.
+ }
+
+
+ /* Process all opened files. */
+ lastblkno = 0;
+ modifiedblockfound = false;
+ for (i = 0; i < fmindex; i++)
+ {
+ char*buf;
+ int hsize;
+ int k;
+ int blkstartoffset;
..
+ }
+
+ for (i = 0; i <= lastblkno; i++)
+ {
+ char blkdata[BLCKSZ];
+ FILE*infp;
+ int offset;
...
+ }
}

Can we breakdown this function in 2-3 functions.  At least creating a
file map can directly go to a separate function.

I have read 0003 and 0004 patch and there are few cosmetic comments.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: block-level incremental backup

2019-09-02 Thread Jeevan Ladhe

Hi Robert,

On Sat, Aug 31, 2019 at 8:29 AM Robert Haas  wrote:

> On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe
>  wrote:
> > Due to the inherent nature of pg_basebackup, the incremental backup also
> > allows taking backup in tar and compressed format. But, pg_combinebackup
> > does not understand how to restore this. I think we should either make
> > pg_combinebackup support restoration of tar incremental backup or
> restrict
> > taking the incremental backup in tar format until pg_combinebackup
> > supports the restoration by making option '--lsn' and '-Ft' exclusive.
> >
> > It is arguable that one can take the incremental backup in tar format,
> extract
> > that manually and then give the resultant directory as input to the
> > pg_combinebackup, but I think that kills the purpose of having
> > pg_combinebackup utility.
>
> I don't agree. You're right that you would have to untar (and
> uncompress) the backup to run pg_combinebackup, but you would also
> have to do that to restore a non-incremental backup, so it doesn't
> seem much different.
>

Thanks. Yes I agree about the similarity between restoring non-incremental
and incremental backup in this case.


>  I don't think it's worth doing that at this point; I definitely don't
> think it
> needs to be part of the first patch.
>

Makes sense.

Regards,
Jeevan Ladhe

Re: block-level incremental backup

2019-08-31 Thread Ibrar Ahmed

On Sat, Aug 31, 2019 at 7:59 AM Robert Haas  wrote:

> On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe
>  wrote:
> > Due to the inherent nature of pg_basebackup, the incremental backup also
> > allows taking backup in tar and compressed format. But, pg_combinebackup
> > does not understand how to restore this. I think we should either make
> > pg_combinebackup support restoration of tar incremental backup or
> restrict
> > taking the incremental backup in tar format until pg_combinebackup
> > supports the restoration by making option '--lsn' and '-Ft' exclusive.
> >
> > It is arguable that one can take the incremental backup in tar format,
> extract
> > that manually and then give the resultant directory as input to the
> > pg_combinebackup, but I think that kills the purpose of having
> > pg_combinebackup utility.
>
> I don't agree. You're right that you would have to untar (and
> uncompress) the backup to run pg_combinebackup, but you would also
> have to do that to restore a non-incremental backup, so it doesn't
> seem much different.  It's true that for an incremental backup you
> might need to untar and uncompress multiple prior backups rather than
> just one, but that's just the nature of an incremental backup.  And,
> on a practical level, if you want compression, which is pretty likely
> if you're thinking about incremental backups, the way to get that is
> to use tar format with -z or -Z.
>
> It might be interesting to teach pg_combinebackup to be able to read
> tar-format backups, but I think that there are several variants of the
> tar format, and I suspect it would need to read them all.  If someone
> un-tars and re-tars a backup with a different tar tool, we don't want
> it to become unreadable.  So we'd either have to write our own
> de-tarring library or add an external dependency on one.


Are we using any tar library in pg_basebackup.c? We already have the
capability
in pg_basebackup to do that.



> I don't
> think it's worth doing that at this point; I definitely don't think it
> needs to be part of the first patch.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

-- 
Ibrar Ahmed

Re: block-level incremental backup

2019-08-30 Thread Robert Haas

On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe
 wrote:
> Due to the inherent nature of pg_basebackup, the incremental backup also
> allows taking backup in tar and compressed format. But, pg_combinebackup
> does not understand how to restore this. I think we should either make
> pg_combinebackup support restoration of tar incremental backup or restrict
> taking the incremental backup in tar format until pg_combinebackup
> supports the restoration by making option '--lsn' and '-Ft' exclusive.
>
> It is arguable that one can take the incremental backup in tar format, extract
> that manually and then give the resultant directory as input to the
> pg_combinebackup, but I think that kills the purpose of having
> pg_combinebackup utility.

I don't agree. You're right that you would have to untar (and
uncompress) the backup to run pg_combinebackup, but you would also
have to do that to restore a non-incremental backup, so it doesn't
seem much different.  It's true that for an incremental backup you
might need to untar and uncompress multiple prior backups rather than
just one, but that's just the nature of an incremental backup.  And,
on a practical level, if you want compression, which is pretty likely
if you're thinking about incremental backups, the way to get that is
to use tar format with -z or -Z.

It might be interesting to teach pg_combinebackup to be able to read
tar-format backups, but I think that there are several variants of the
tar format, and I suspect it would need to read them all.  If someone
un-tars and re-tars a backup with a different tar tool, we don't want
it to become unreadable.  So we'd either have to write our own
de-tarring library or add an external dependency on one.  I don't
think it's worth doing that at this point; I definitely don't think it
needs to be part of the first patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-08-30 Thread Jeevan Ladhe

Here are some comments:

+/* The reference XLOG position for the incremental backup. */

+static XLogRecPtr refptr;

As Robert already pointed we may want to pass this as parameter around
instead
of a global variable. Also, can be renamed to something like:
incr_backup_refptr.
I see in your earlier version of patch this was named startincrptr, which I
think was more meaningful.

-

/*

 * If incremental backup, see whether the filename is a relation
filename
 * or not.

 */

Can be reworded something like:
"If incremental backup, check if it is relation file and can be sent
partially."

-

+   if (verify_checksum)
+   {
+   ereport(WARNING,
+   (errmsg("cannot verify checksum in file \"%s\",
block "
+   "%d: read buffer size %d and page size %d "
+   "differ",
+   readfilename, blkno, (int) cnt, BLCKSZ)));
+   verify_checksum = false;
+   }

For do_incremental_backup() it does not make sense to show the block number
in
warning as it is always going to be 0 when we throw this warning.
Further, I think this can be rephrased as:
"cannot verify checksum in file \"%s\", read file size %d is not multiple of
page size %d".

Or maybe we can just say:
"cannot verify checksum in file \"%s\"" if checksum requested, disable the
checksum and leave it to the following message:

+   ereport(WARNING,
+   (errmsg("file size (%d) not in multiple of page size
(%d), sending whole file",
+   (int) cnt, BLCKSZ)));

-

If you agree on the above comment for blkno, then we can shift declaration
of blkno
inside the condition "   if (!sendwholefile)" in
do_incremental_backup(), or
avoid it altogether, and just pass "i" as blkindex, as well as blkno to
verify_page_checksum(). May be add a comment why they are same in case of
incremental backup.

-

I think we should give the user hint from where he should be reading the
input
lsn for incremental backup in the --help option as well as documentation?
Something like - "To take an incremental backup, please provide value of
"--lsn"
as the "START WAL LOCATION" of previously taken full backup or incremental
backup from backup_lable file.

-

pg_combinebackup:

+static bool made_new_outputdata = false;
+static bool found_existing_outputdata = false;

Both of these are global, I understand that we need them global so that
they are
accessible in cleanup_directories_atexit(). But they are passed to
verify_dir_is_empty_or_create() as parameters, which I think is not needed.
Instead verify_dir_is_empty_or_create() can directly change the globals.

-

I see that checksum_failure is never set and always remains as false. May be
it is something that you wanted to set in combine_partial_files() when a
the corrupted partial file is detected?

-

I think the logic for verifying the backup chain should be moved out from
main()
function to a separate function.

-

+ /*
+ * Verify the backup chain.  INCREMENTAL BACKUP REFERENCE WAL LOCATION of
+ * the incremental backup must match with the START WAL LOCATION of the
+ * previous backup, until we reach a full backup in which there is no
+ * INCREMENTAL BACKUP REFERENCE WAL LOCATION.
+ */

The current logic assumes the incremental backup directories are to be
provided
as input in the serial order the backups were taken. This is bit confusing
unless clarified in pg_combinebackup help menu or documentation. I think we
should clarify it at both the places.

-

I think scan_directory() should be rather renamed as do_combinebackup().

Regards,
Jeevan Ladhe

On Thu, Aug 29, 2019 at 8:11 PM Jeevan Ladhe 
wrote:

> Due to the inherent nature of pg_basebackup, the incremental backup also
> allows taking backup in tar and compressed format. But, pg_combinebackup
> does not understand how to restore this. I think we should either make
> pg_combinebackup support restoration of tar incremental backup or restrict
> taking the incremental backup in tar format until pg_combinebackup
> supports the restoration by making option '--lsn' and '-Ft' exclusive.
>
> It is arguable that one can take the incremental backup in tar format,
> extract
> that manually and then give the resultant directory as input to the
> pg_combinebackup, but I think that kills the purpose of having
> pg_combinebackup utility.
>
> Thoughts?
>
> Regards,
> Jeevan Ladhe
>

Re: block-level incremental backup

2019-08-30 Thread Rajkumar Raghuwanshi

Hi,

I am doing some testing on pg_basebackup and pg_combinebackup patches. I
have also tried to create tap test for pg_combinebackup by taking
reference from pg_basebackup tap cases.
Attaching first draft test patch.

I have done some testing with compression options, both -z and -Z level is
working with incremental backup.

A minor comment : It is mentioned in pg_combinebackup help that maximum 10
incremental backup can be given with -i option, but I found maximum 9
incremental backup directories can be given at a time.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


On Thu, Aug 29, 2019 at 10:06 PM Jeevan Ladhe 
wrote:

> Due to the inherent nature of pg_basebackup, the incremental backup also
> allows taking backup in tar and compressed format. But, pg_combinebackup
> does not understand how to restore this. I think we should either make
> pg_combinebackup support restoration of tar incremental backup or restrict
> taking the incremental backup in tar format until pg_combinebackup
> supports the restoration by making option '--lsn' and '-Ft' exclusive.
>
> It is arguable that one can take the incremental backup in tar format,
> extract
> that manually and then give the resultant directory as input to the
> pg_combinebackup, but I think that kills the purpose of having
> pg_combinebackup utility.
>
> Thoughts?
>
> Regards,
> Jeevan Ladhe
>
diff --git a/src/bin/pg_combinebackup/t/pg_combinebackup.pl b/src/bin/pg_combinebackup/t/pg_combinebackup.pl
new file mode 100644
index 000..e0f834a
--- /dev/null
+++ b/src/bin/pg_combinebackup/t/pg_combinebackup.pl
@@ -0,0 +1,79 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 23;
+
+program_help_ok('pg_combinebackup');
+program_version_ok('pg_combinebackup');
+program_options_handling_ok('pg_combinebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Initialize node
+$node->init();
+my $pgdata = $node->data_dir;
+
+# Change wal related setting for pg_basebackup to run
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->start;
+
+$node->command_fails(['pg_combinebackup'],
+	'pg_combinebackup needs full and incremental directory specified');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+	q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath",'unlogged main fork in base');
+
+# Run full base backup.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup"],
+	'pg_basebackup runs for full backup');
+ok(-f "$tempdir/backup/PG_VERSION", 'full backup was created');
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+	'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+	'unlogged main fork not in backup');
+
+# Get LSN of last backup to use for incremental backupslurp_file
+my @extract_lsn = split (" ", scalar TestLib::slurp_file("$tempdir/backup/backup_label"));
+my $LSN = $extract_lsn[3];
+
+# Run incr base backup.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup1",'--lsn', "$LSN"],
+	'pg_basebackup runs for incremental backup');
+ok(-f "$tempdir/backup1/PG_VERSION", 'incremental backup was created');
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup1/${baseUnloggedPath}_init",
+	'unlogged init fork in backup');
+ok( !-f "$tempdir/backup1/$baseUnloggedPath",
+	'unlogged main fork not in backup');
+
+# Run pg_combinebackup.
+$node->command_ok([ 'pg_combinebackup', '-f', "$tempdir/backup", '-i', "$tempdir/backup1", '-o', "$tempdir/backup2"],
+	'pg_combinebackup runs');
+ok(-f "$tempdir/backup2/PG_VERSION", 'combined backup was created');
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup2/${baseUnloggedPath}_init",
+	'unlogged init fork in backup');
+ok( !-f "$tempdir/backup2/$baseUnloggedPath",
+	'unlogged main fork not in backup');

Re: block-level incremental backup

2019-08-29 Thread Jeevan Ladhe

Due to the inherent nature of pg_basebackup, the incremental backup also
allows taking backup in tar and compressed format. But, pg_combinebackup
does not understand how to restore this. I think we should either make
pg_combinebackup support restoration of tar incremental backup or restrict
taking the incremental backup in tar format until pg_combinebackup
supports the restoration by making option '--lsn' and '-Ft' exclusive.

It is arguable that one can take the incremental backup in tar format,
extract
that manually and then give the resultant directory as input to the
pg_combinebackup, but I think that kills the purpose of having
pg_combinebackup utility.

Thoughts?

Regards,
Jeevan Ladhe

Re: block-level incremental backup

2019-08-27 Thread Robert Haas

On Fri, Aug 16, 2019 at 6:23 AM Jeevan Chalke
 wrote:
> [ patches ]

Reviewing 0002 and 0003:

- Commit message for 0003 claims magic number and checksum are 0, but
that (fortunately) doesn't seem to be the case.

- looks_like_rel_name actually checks whether it looks like a
*non-temporary* relation name; suggest adjusting the function name.

- The names do_full_backup and do_incremental_backup are quite
confusing because you're really talking about what to do with one
file.  I suggest sendCompleteFile() and sendPartialFile().

- Is there any good reason to have 'refptr' as a global variable, or
could we just pass the LSN around via function arguments?  I know it's
just mimicking startptr, but storing startptr in a global variable
doesn't seem like a great idea either, so if it's not too annoying,
let's pass it down via function arguments instead.  Also, refptr is a
crappy name (even worse than startptr); whether we end up with a
global variable or a bunch of local variables, let's make the name(s)
clear and unambiguous, like incremental_reference_lsn.  Yeah, I know
that's long, but I still think it's better than being unclear.

- do_incremental_backup looks like it can never report an error from
fread(), which is bad.  But I see that this is just copied from the
existing code which has the same problem, so I started a separate
thread about that.

- I think that passing cnt and blkindex to verify_page_checksum()
doesn't look very good from an abstraction point of view.  Granted,
the existing code isn't great either, but I think this makes the
problem worse.  I suggest passing "int backup_distance" to this
function, computed as cnt - BLCKSZ * blkindex.  Then, you can
fseek(-backup_distance), fread(BLCKSZ), and then fseek(backup_distance
- BLCKSZ).

- While I generally support the use of while and for loops rather than
goto for flow control, a while (1) loop that ends with a break is
functionally a goto anyway.  I think there are several ways this could
be revised.  The most obvious one is probably to use goto, but I vote
for inverting the sense of the test: if (PageIsNew(page) ||
PageGetLSN(page) >= startptr) break; This approach also saves a level
of indentation for more than half of the function.

- I am not sure that it's a good idea for sendwholefile = true to
result in dumping the entire file onto the wire in a single CopyData
message.  I don't know of a concrete problem in typical
configurations, but someone who increases RELSEG_SIZE might be able to
overflow CopyData's length word.  At 2GB the length word would be
negative, which might break, and at 4GB it would wrap around, which
would certainly break.  See CopyData in
https://www.postgresql.org/docs/12/protocol-message-formats.html  To
avoid this issue, and maybe some others, I suggest defining a
reasonably large chunk size, say 1MB as a constant in this file
someplace, and sending the data as a series of chunks of that size.

- I don't think that the way concurrent truncation is handled is
correct for partial files.  Right now it just falls through to code
which appends blocks of zeroes in either the complete-file or
partial-file case.  I think that logic should be moved into the
function that handles the complete-file case.  In the partial-file
case, the blocks that we actually send need to match the list of block
numbers we promised to send.  We can't just send the promised blocks
and then tack a bunch of zero-filled blocks onto the end that the file
header doesn't know about.

- For reviewer convenience, please use the -v option to git
format-patch when posting and reposting a patch series.  Using -v2,
-v3, etc. on successive versions really helps.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-08-27 Thread vignesh C

On Fri, Aug 16, 2019 at 8:07 PM Ibrar Ahmed  wrote:
>
> What do you mean by bigger file, a file greater than 1GB? In which case you 
> get file > 1GB?
>
>
>
Few comments:
Comment:
+ buf = (char *) malloc(statbuf->st_size);
+ if (buf == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory")));
+
+ if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0)
+ {
+ Bitmapset  *mod_blocks = NULL;
+ int nmodblocks = 0;
+
+ if (cnt % BLCKSZ != 0)
+ {

We can use same size as full page size.
After pg start backup full page write will be enabled.
We can use the same file size to maintain data consistency.

Comment:
/* Validate given LSN and convert it into XLogRecPtr. */
+ opt->lsn = pg_lsn_in_internal(strVal(defel->arg), &have_error);
+ if (XLogRecPtrIsInvalid(opt->lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid value for LSN")));

Validate input lsn is less than current system lsn.

Comment:
/* Validate given LSN and convert it into XLogRecPtr. */
+ opt->lsn = pg_lsn_in_internal(strVal(defel->arg), &have_error);
+ if (XLogRecPtrIsInvalid(opt->lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid value for LSN")));

Should we check if it is same timeline as the system's timeline.

Comment:
+ if (fread(blkdata, 1, BLCKSZ, infp) != BLCKSZ)
+ {
+ pg_log_error("could not read from file \"%s\": %m", outfn);
+ cleanup_filemaps(filemaps, fmindex + 1);
+ exit(1);
+ }
+
+ /* Finally write one block to the output file */
+ if (fwrite(blkdata, 1, BLCKSZ, outfp) != BLCKSZ)
+ {
+ pg_log_error("could not write to file \"%s\": %m", outfn);
+ cleanup_filemaps(filemaps, fmindex + 1);
+ exit(1);
+ }

Should we support compression formats supported by pg_basebackup.
This can be an enhancement after the functionality is completed.

Comment:
We should provide some mechanism to validate the backup. To identify
if some backup is corrupt or some file is missing(deleted) in a
backup.

Comment:
+ ofp = fopen(tofn, "wb");
+ if (ofp == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", tofn);
+ exit(1);
+ }

ifp should be closed in the error flow.

Comment:
+ fp = fopen(filename, "r");
+ if (fp == NULL)
+ {
+ pg_log_error("could not read file \"%s\": %m", filename);
+ exit(1);
+ }
+
+ labelfile = pg_malloc(statbuf.st_size + 1);
+ if (fread(labelfile, 1, statbuf.st_size, fp) != statbuf.st_size)
+ {
+ pg_log_error("corrupted file \"%s\": %m", filename);
+ pg_free(labelfile);
+ exit(1);
+ }

fclose can be moved above.

Comment:
+ if (!modifiedblockfound)
+ {
+ copy_whole_file(fm->filename, outfn);
+ cleanup_filemaps(filemaps, fmindex + 1);
+ return;
+ }
+
+ /* Write all blocks to the output file */
+
+ if (fstat(fileno(fm->fp), &statbuf) != 0)
+ {
+ pg_log_error("could not stat file \"%s\": %m", fm->filename);
+ pg_free(filemaps);
+ exit(1);
+ }

Some error flow, cleanup_filemaps need to be called to close the file
descriptors that are opened.

Comment:
+/*
+ * When to send the whole file, % blocks modified (90%)
+ */
+#define WHOLE_FILE_THRESHOLD 0.9
+

This can be user configured value.
This can be an enhancement after the functionality is completed.


Comment:
We can add a readme file with all the details regarding incremental
backup and combine backup.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: block-level incremental backup

2019-08-16 Thread Ibrar Ahmed

On Fri, Aug 16, 2019 at 4:12 PM Ibrar Ahmed  wrote:

>
>
>
>
> On Fri, Aug 16, 2019 at 3:24 PM Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote:
>
>>
>>
>> On Fri, Aug 2, 2019 at 6:43 PM vignesh C  wrote:
>>
>>> Some comments:
>>> 1) There will be some link files created for tablespace, we might
>>> require some special handling for it
>>>
>>
>> Yep. I have that in my ToDo.
>> Will start working on that soon.
>>
>>
>>> 2)
>>> Retry functionality is hanlded only for copying of full files, should
>>> we handle retry for copying of partial files
>>> 3)
>>> we can have some range for maxretries similar to sleeptime
>>>
>>
>> I took help from pg_standby code related to maxentries and sleeptime.
>>
>> However, as we don't want to use system() call now, I have
>> removed all this kludge and just used fread/fwrite as discussed.
>>
>>
>>> 4)
>>> Should we check for malloc failure
>>>
>>
>> Used pg_malloc() instead. Same is also suggested by Ibrar.
>>
>>
>>>
>>> 5) Should we add display of progress as backup may take some time,
>>> this can be added as enhancement. We can get other's opinion on this.
>>>
>>
>> Can be done afterward once we have the functionality in place.
>>
>>
>>>
>>> 6)
>>> If the backup count increases providing the input may be difficult,
>>> Shall user provide all the incremental backups from a parent folder
>>> and can we handle the ordering of incremental backup internally
>>>
>>
>> I am not sure of this yet. We need to provide the tablespace mapping too.
>> But thanks for putting a point here. Will keep that in mind when I
>> revisit this.
>>
>>
>>>
>>> 7)
>>> Add verbose for copying whole file
>>>
>> Done
>>
>>
>>>
>>> 8) We can also check if approximate space is available in disk before
>>> starting combine backup, this can be added as enhancement. We can get
>>> other's opinion on this.
>>>
>>
>> Hmm... will leave it for now. User will get an error anyway.
>>
>>
>>>
>>> 9)
>>> Combine backup into directory can be combine backup directory
>>>
>> Done
>>
>>
>>>
>>> 10)
>>> MAX_INCR_BK_COUNT can be increased little, some applications use 1
>>> full backup at the beginning of the month and use 30 incremental
>>> backups rest of the days in the month
>>>
>>
>> Yeah, agree. But using any number here is debatable.
>> Let's see others opinion too.
>>
> Why not use a list?
>
>
>>
>>
>>> Regards,
>>> Vignesh
>>> EnterpriseDB: http://www.enterprisedb.com
>>>
>>
>>
>> Attached new sets of patches with refactoring done separately.
>> Incremental backup patch became small now and hopefully more
>> readable than the first version.
>>
>> --
>> Jeevan Chalke
>> Technical Architect, Product Development
>> EnterpriseDB Corporation
>> The Enterprise PostgreSQL Company
>>
>>
>
> +   buf = (char *) malloc(statbuf->st_size);
>
> +   if (buf == NULL)
>
> +   ereport(ERROR,
>
> +   (errcode(ERRCODE_OUT_OF_MEMORY),
>
> +errmsg("out of memory")));
>
> Why are you using malloc, you can use palloc here.
>
>
>
> Hi, I gave another look at the patch and have some quick comments.


-
> char   *extptr = strstr(fn, ".partial");

I think there should be a better and strict way to check the file
extension.

-
> +   extptr = strstr(outfn, ".partial");
> +   Assert (extptr != NULL);

Why are you checking that again, you just appended that in the above
statement?

-
> +   if (verbose && statbuf.st_size > (RELSEG_SIZE * BLCKSZ))
> +   pg_log_info("found big file \"%s\" (size: %.2lfGB): %m",
fromfn,
> +   (double) statbuf.st_size /
(RELSEG_SIZE * BLCKSZ));

This is not just a log, you find a file which is bigger which surely has
some problem.

-
> +* We do read entire 1GB file in memory while taking incremental
backup; so
> +* I don't see any reason why can't we do that here.  Also,
copying data in
> +* chunks is expensive.  However, for bigger files, we still
slice at 1GB
> +* border.


What do you mean by bigger file, a file greater than 1GB? In which case you
get file > 1GB?


-- 
Ibrar Ahmed

Re: block-level incremental backup

2019-08-16 Thread Ibrar Ahmed

On Fri, Aug 16, 2019 at 3:24 PM Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
>
> On Fri, Aug 2, 2019 at 6:43 PM vignesh C  wrote:
>
>> Some comments:
>> 1) There will be some link files created for tablespace, we might
>> require some special handling for it
>>
>
> Yep. I have that in my ToDo.
> Will start working on that soon.
>
>
>> 2)
>> Retry functionality is hanlded only for copying of full files, should
>> we handle retry for copying of partial files
>> 3)
>> we can have some range for maxretries similar to sleeptime
>>
>
> I took help from pg_standby code related to maxentries and sleeptime.
>
> However, as we don't want to use system() call now, I have
> removed all this kludge and just used fread/fwrite as discussed.
>
>
>> 4)
>> Should we check for malloc failure
>>
>
> Used pg_malloc() instead. Same is also suggested by Ibrar.
>
>
>>
>> 5) Should we add display of progress as backup may take some time,
>> this can be added as enhancement. We can get other's opinion on this.
>>
>
> Can be done afterward once we have the functionality in place.
>
>
>>
>> 6)
>> If the backup count increases providing the input may be difficult,
>> Shall user provide all the incremental backups from a parent folder
>> and can we handle the ordering of incremental backup internally
>>
>
> I am not sure of this yet. We need to provide the tablespace mapping too.
> But thanks for putting a point here. Will keep that in mind when I revisit
> this.
>
>
>>
>> 7)
>> Add verbose for copying whole file
>>
> Done
>
>
>>
>> 8) We can also check if approximate space is available in disk before
>> starting combine backup, this can be added as enhancement. We can get
>> other's opinion on this.
>>
>
> Hmm... will leave it for now. User will get an error anyway.
>
>
>>
>> 9)
>> Combine backup into directory can be combine backup directory
>>
> Done
>
>
>>
>> 10)
>> MAX_INCR_BK_COUNT can be increased little, some applications use 1
>> full backup at the beginning of the month and use 30 incremental
>> backups rest of the days in the month
>>
>
> Yeah, agree. But using any number here is debatable.
> Let's see others opinion too.
>
Why not use a list?


>
>
>> Regards,
>> Vignesh
>> EnterpriseDB: http://www.enterprisedb.com
>>
>
>
> Attached new sets of patches with refactoring done separately.
> Incremental backup patch became small now and hopefully more
> readable than the first version.
>
> --
> Jeevan Chalke
> Technical Architect, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
>

+   buf = (char *) malloc(statbuf->st_size);

+   if (buf == NULL)

+   ereport(ERROR,

+   (errcode(ERRCODE_OUT_OF_MEMORY),

+errmsg("out of memory")));

Why are you using malloc, you can use palloc here.




-- 
Ibrar Ahmed

Re: block-level incremental backup

2019-08-16 Thread Jeevan Chalke

On Fri, Aug 9, 2019 at 6:07 AM Jeevan Ladhe 
wrote:

> Hi Jeevan,
>
> I have reviewed the backup part at code level and still looking into the
> restore(combine) and functional part of it. But, here are my comments so
> far:
>

Thank you Jeevan Ladhe for reviewing the changes.


>
> The patches need rebase.
>

Done.


> May be we should rename to something like:
> "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP
> START LOCATION"
> to make it more intuitive?
>

As discussed, used "INCREMENTAL BACKUP REFERENCE WAL LOCATION".

File header structure is defined in both the files basebackup.c and
> pg_combinebackup.c. I think it is better to move this to
> replication/basebackup.h.
>

Yep. Was that in my cleanup list. Done now.


> I think we can avoid having flag isrelfile in sendFile().
> Something like this:
>
Also, having isrelfile as part of following condition:
> is confusing, because even the relation files in full backup are going to
> be
> backed up by this loop only, but still, the condition reads '(!isrelfile
> &&...)'.
>

In the refactored patch I have moved full backup code in a separate
function.
And now all incremental backup code is also done in its own function.
Hopefully, the code is now more readable.


>
> IMHO, while labels are not advisable in general, it may be better to use a
> label
> here rather than a while(1) loop, so that we can move to the label in case
> we
> want to retry once. I think here it opens doors for future bugs if someone
> happens to add code here, ending up adding some condition and then the
> break becomes conditional. That will leave us in an infinite loop.
>

I kept it as is as I don't see any correctness issue here.

Similar to structure partial_file_header, I think above macro can also be
> moved
> to basebackup.h instead of defining it twice.
>

Yes. Done.


> I think this is a huge memory request (1GB) and may fail on busy/loaded
> server at
> times. We should check for failures of malloc, maybe throw some error on
> getting ENOMEM as errno.
>

Agree. Done.


> Here, should not we expect statbuf->st_size < (RELSEG_SIZE * BLCKSZ), and
> it
> should be safe to read just statbuf_st_size always I guess? But, I am ok
> with
> having this extra guard here.
>

Yes, we can do this way. Added an Assert() before that and used just
statbuf->st_size.

In sendFile(), I am sorry if I am missing something, but I am not able to
> understand why 'cnt' and 'i' should have different values when they are
> being
> passed to verify_page_checksum(). I think passing only one of them should
> be
> sufficient.
>

As discussed offline, you meant to say i and blkno.
These two are different. i represent the current block offset from the read
buffer whereas blkno is the offset from the start of the page. For
incremental
backup, they are same as we read the whole file but they are different in
case
of regular full backup where we read 4 blocks at a time. i value there will
be
between 0 and 3.


> Maybe we should just have a variable no_of_blocks to store a number of
> blocks,
> rather than calculating this say RELSEG_SIZE(i.e. 131072) times in the
> worst
> case.
>

OK. Done.


> Sorry if I am missing something, but, should not it be just:
>
> len = cnt;
>

Yeah. Done.


> As I said earlier in my previous email, we now do not need
> +decode_lsn_internal()
> as it is already taken care by the introduction of function
> pg_lsn_in_internal().
>

Yes. Done that and rebased on latest HEAD.


>
> Regards,
> Jeevan Ladhe
>

Patches attached in the previous reply.

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-08-16 Thread Jeevan Chalke

On Fri, Aug 2, 2019 at 6:43 PM vignesh C  wrote:

> Some comments:
> 1) There will be some link files created for tablespace, we might
> require some special handling for it
>

Yep. I have that in my ToDo.
Will start working on that soon.


> 2)
> Retry functionality is hanlded only for copying of full files, should
> we handle retry for copying of partial files
> 3)
> we can have some range for maxretries similar to sleeptime
>

I took help from pg_standby code related to maxentries and sleeptime.

However, as we don't want to use system() call now, I have
removed all this kludge and just used fread/fwrite as discussed.


> 4)
> Should we check for malloc failure
>

Used pg_malloc() instead. Same is also suggested by Ibrar.


>
> 5) Should we add display of progress as backup may take some time,
> this can be added as enhancement. We can get other's opinion on this.
>

Can be done afterward once we have the functionality in place.


>
> 6)
> If the backup count increases providing the input may be difficult,
> Shall user provide all the incremental backups from a parent folder
> and can we handle the ordering of incremental backup internally
>

I am not sure of this yet. We need to provide the tablespace mapping too.
But thanks for putting a point here. Will keep that in mind when I revisit
this.


>
> 7)
> Add verbose for copying whole file
>
Done


>
> 8) We can also check if approximate space is available in disk before
> starting combine backup, this can be added as enhancement. We can get
> other's opinion on this.
>

Hmm... will leave it for now. User will get an error anyway.


>
> 9)
> Combine backup into directory can be combine backup directory
>
Done


>
> 10)
> MAX_INCR_BK_COUNT can be increased little, some applications use 1
> full backup at the beginning of the month and use 30 incremental
> backups rest of the days in the month
>

Yeah, agree. But using any number here is debatable.
Let's see others opinion too.


> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
>


Attached new sets of patches with refactoring done separately.
Incremental backup patch became small now and hopefully more
readable than the first version.

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
From 93041c12a7d07bf17073c9cf4571bd3b5a8acc81 Mon Sep 17 00:00:00 2001
From: Jeevan Chalke 
Date: Fri, 16 Aug 2019 14:08:34 +0530
Subject: [PATCH 1/4] Add support for command line option to pass LSN.

This adds [ LSN 'lsn' ] to BASE_BACKUP command and --lsn=LSN to
the pg_basebackup binary.

Also, add small tests.
---
 src/backend/replication/basebackup.c | 20 
 src/backend/replication/repl_gram.y  |  6 ++
 src/backend/replication/repl_scanner.l   |  1 +
 src/bin/pg_basebackup/pg_basebackup.c| 15 +--
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 12 +++-
 5 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c91f66d..74c954b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -38,6 +38,7 @@
 #include "storage/ipc.h"
 #include "storage/reinit.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/relcache.h"
 #include "utils/timestamp.h"
@@ -52,6 +53,7 @@ typedef struct
 	bool		includewal;
 	uint32		maxrate;
 	bool		sendtblspcmapfile;
+	XLogRecPtr 	lsn;
 } basebackup_options;
 
 
@@ -638,6 +640,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 	bool		o_maxrate = false;
 	bool		o_tablespace_map = false;
 	bool		o_noverify_checksums = false;
+	bool		o_lsn = false;
 
 	MemSet(opt, 0, sizeof(*opt));
 	foreach(lopt, options)
@@ -726,6 +729,23 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 			noverify_checksums = true;
 			o_noverify_checksums = true;
 		}
+		else if (strcmp(defel->defname, "lsn") == 0)
+		{
+			bool		have_error = false;
+
+			if (o_lsn)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("duplicate option \"%s\"", defel->defname)));
+			o_lsn = true;
+
+			/* Validate given LSN and convert it into XLogRecPtr. */
+			opt->lsn = pg_lsn_in_internal(strVal(defel->arg), &have_error);
+			if (XLogRecPtrIsInvalid(opt->lsn))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+		 errmsg("invalid value for LSN")));
+		}
 		else
 			elog(ERROR, "option \"%s\" not recognized",
  defel->defname);
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc..c24d319 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,6 +87,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_EXPORT_SNAPSHOT
 %token K_NOEXPORT_SNAPSHOT
 %token K_USE_SNAPSHOT
+%token K_LSN
 
 %type 	command
 %type 	base_backup start_replication start_

Re: block-level incremental backup

2019-08-13 Thread Ibrar Ahmed

On Mon, Aug 12, 2019 at 4:57 PM Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
>
> On Fri, Aug 9, 2019 at 6:36 PM Robert Haas  wrote:
>
>> On Wed, Aug 7, 2019 at 5:46 AM Jeevan Chalke
>>  wrote:
>> > So, do you mean we should just do fread() and fwrite() for the whole
>> file?
>> >
>> > I thought it is better if it was done by the OS itself instead of
>> reading 1GB
>> > into the memory and writing the same to the file.
>>
>> Well, 'cp' is just a C program.  If they can write code to copy a
>> file, so can we, and then we're not dependent on 'cp' being installed,
>> working properly, being in the user's path or at the hard-coded
>> pathname we expect, etc.  There's an existing copy_file() function in
>> src/backed/storage/file/copydir.c which I'd probably look into
>> adapting for frontend use.  I'm not sure whether it would be important
>> to adapt the data-flushing code that's present in that routine or
>> whether we could get by with just the loop to read() and write() data.
>>
>
> Agree that we can certainly use open(), read(), write(), and close() here,
> but
> given that pg_basebackup.c and basbackup.c are using file operations, I
> think
> using fopen(), fread(), fwrite(), and fclose() will be better here,
> at-least
> for consistetncy.
>

+1 for using  fopen(), fread(), fwrite(), and fclose()


> Let me know if we still want to go with native OS calls.
>
>

-1 for OS call


>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>
> --
> Jeevan Chalke
> Technical Architect, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
>

-- 
Ibrar Ahmed

Re: block-level incremental backup

2019-08-12 Thread Robert Haas

On Mon, Aug 12, 2019 at 7:57 AM Jeevan Chalke
 wrote:
> Agree that we can certainly use open(), read(), write(), and close() here, but
> given that pg_basebackup.c and basbackup.c are using file operations, I think
> using fopen(), fread(), fwrite(), and fclose() will be better here, at-least
> for consistetncy.

Oh, that's fine.  Whatever's more consistent with the pre-existing
code. Just, let's not use system().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-08-12 Thread Jeevan Chalke

On Mon, Aug 12, 2019 at 5:29 PM Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
>
> On Fri, Aug 9, 2019 at 11:56 PM Jeevan Ladhe <
> jeevan.la...@enterprisedb.com> wrote:
>
>> Hi Robert,
>>
>> On Fri, Aug 9, 2019 at 6:40 PM Robert Haas  wrote:
>>
>>> On Thu, Aug 8, 2019 at 8:37 PM Jeevan Ladhe
>>>  wrote:
>>> > +   if (!XLogRecPtrIsInvalid(previous_lsn))
>>> > +   appendStringInfo(labelfile, "PREVIOUS WAL LOCATION:
>>> %X/%X\n",
>>> > +(uint32) (previous_lsn >> 32), (uint32)
>>> previous_lsn);
>>> >
>>> > May be we should rename to something like:
>>> > "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP
>>> START LOCATION"
>>> > to make it more intuitive?
>>>
>>> So, I think that you are right that PREVIOUS WAL LOCATION might not be
>>> entirely clear, but at least in my view, INCREMENTAL BACKUP START WAL
>>> LOCATION is definitely not clear.  This backup is an incremental
>>> backup, and it has a start WAL location, so you'd end up with START
>>> WAL LOCATION and INCREMENTAL BACKUP START WAL LOCATION and those sound
>>> like they ought to both be the same thing, but they're not.  Perhaps
>>> something like REFERENCE WAL LOCATION or REFERENCE WAL LOCATION FOR
>>> INCREMENTAL BACKUP would be clearer.
>>>
>>
>> Agree, how about INCREMENTAL BACKUP REFERENCE WAL LOCATION ?
>>
>
> +1 for INCREMENTAL BACKUP REFERENCE WA.
>

Sorry for the typo:
+1 for the INCREMENTAL BACKUP REFERENCE WAL LOCATION.


>
>>
>
> --
> Jeevan Chalke
> Technical Architect, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
>

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-08-12 Thread Jeevan Chalke

On Fri, Aug 9, 2019 at 11:56 PM Jeevan Ladhe 
wrote:

> Hi Robert,
>
> On Fri, Aug 9, 2019 at 6:40 PM Robert Haas  wrote:
>
>> On Thu, Aug 8, 2019 at 8:37 PM Jeevan Ladhe
>>  wrote:
>> > +   if (!XLogRecPtrIsInvalid(previous_lsn))
>> > +   appendStringInfo(labelfile, "PREVIOUS WAL LOCATION:
>> %X/%X\n",
>> > +(uint32) (previous_lsn >> 32), (uint32)
>> previous_lsn);
>> >
>> > May be we should rename to something like:
>> > "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP
>> START LOCATION"
>> > to make it more intuitive?
>>
>> So, I think that you are right that PREVIOUS WAL LOCATION might not be
>> entirely clear, but at least in my view, INCREMENTAL BACKUP START WAL
>> LOCATION is definitely not clear.  This backup is an incremental
>> backup, and it has a start WAL location, so you'd end up with START
>> WAL LOCATION and INCREMENTAL BACKUP START WAL LOCATION and those sound
>> like they ought to both be the same thing, but they're not.  Perhaps
>> something like REFERENCE WAL LOCATION or REFERENCE WAL LOCATION FOR
>> INCREMENTAL BACKUP would be clearer.
>>
>
> Agree, how about INCREMENTAL BACKUP REFERENCE WAL LOCATION ?
>

+1 for INCREMENTAL BACKUP REFERENCE WA.


>

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-08-12 Thread Jeevan Chalke

On Fri, Aug 9, 2019 at 6:36 PM Robert Haas  wrote:

> On Wed, Aug 7, 2019 at 5:46 AM Jeevan Chalke
>  wrote:
> > So, do you mean we should just do fread() and fwrite() for the whole
> file?
> >
> > I thought it is better if it was done by the OS itself instead of
> reading 1GB
> > into the memory and writing the same to the file.
>
> Well, 'cp' is just a C program.  If they can write code to copy a
> file, so can we, and then we're not dependent on 'cp' being installed,
> working properly, being in the user's path or at the hard-coded
> pathname we expect, etc.  There's an existing copy_file() function in
> src/backed/storage/file/copydir.c which I'd probably look into
> adapting for frontend use.  I'm not sure whether it would be important
> to adapt the data-flushing code that's present in that routine or
> whether we could get by with just the loop to read() and write() data.
>

Agree that we can certainly use open(), read(), write(), and close() here,
but
given that pg_basebackup.c and basbackup.c are using file operations, I
think
using fopen(), fread(), fwrite(), and fclose() will be better here, at-least
for consistetncy.

Let me know if we still want to go with native OS calls.


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-08-09 Thread Jeevan Ladhe

Hi Robert,

On Fri, Aug 9, 2019 at 6:40 PM Robert Haas  wrote:

> On Thu, Aug 8, 2019 at 8:37 PM Jeevan Ladhe
>  wrote:
> > +   if (!XLogRecPtrIsInvalid(previous_lsn))
> > +   appendStringInfo(labelfile, "PREVIOUS WAL LOCATION: %X/%X\n",
> > +(uint32) (previous_lsn >> 32), (uint32)
> previous_lsn);
> >
> > May be we should rename to something like:
> > "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP
> START LOCATION"
> > to make it more intuitive?
>
> So, I think that you are right that PREVIOUS WAL LOCATION might not be
> entirely clear, but at least in my view, INCREMENTAL BACKUP START WAL
> LOCATION is definitely not clear.  This backup is an incremental
> backup, and it has a start WAL location, so you'd end up with START
> WAL LOCATION and INCREMENTAL BACKUP START WAL LOCATION and those sound
> like they ought to both be the same thing, but they're not.  Perhaps
> something like REFERENCE WAL LOCATION or REFERENCE WAL LOCATION FOR
> INCREMENTAL BACKUP would be clearer.
>

Agree, how about INCREMENTAL BACKUP REFERENCE WAL LOCATION ?


> > File header structure is defined in both the files basebackup.c and
> > pg_combinebackup.c. I think it is better to move this to
> replication/basebackup.h.
>
> Or some other header, but yeah, definitely don't duplicate the struct
> definition (or any other kind of definition).
>

Thanks.


> > IMHO, while labels are not advisable in general, it may be better to use
> a label
> > here rather than a while(1) loop, so that we can move to the label in
> case we
> > want to retry once. I think here it opens doors for future bugs if
> someone
> > happens to add code here, ending up adding some condition and then the
> > break becomes conditional. That will leave us in an infinite loop.
>
> I'm not sure which style is better here, but I don't really buy this
> argument.


No issues. I am ok either way.

Regards,
Jeevan Ladhe

Re: block-level incremental backup

2019-08-09 Thread Robert Haas

On Thu, Aug 8, 2019 at 8:37 PM Jeevan Ladhe
 wrote:
> +   if (!XLogRecPtrIsInvalid(previous_lsn))
> +   appendStringInfo(labelfile, "PREVIOUS WAL LOCATION: %X/%X\n",
> +(uint32) (previous_lsn >> 32), (uint32) 
> previous_lsn);
>
> May be we should rename to something like:
> "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP START 
> LOCATION"
> to make it more intuitive?

So, I think that you are right that PREVIOUS WAL LOCATION might not be
entirely clear, but at least in my view, INCREMENTAL BACKUP START WAL
LOCATION is definitely not clear.  This backup is an incremental
backup, and it has a start WAL location, so you'd end up with START
WAL LOCATION and INCREMENTAL BACKUP START WAL LOCATION and those sound
like they ought to both be the same thing, but they're not.  Perhaps
something like REFERENCE WAL LOCATION or REFERENCE WAL LOCATION FOR
INCREMENTAL BACKUP would be clearer.

> File header structure is defined in both the files basebackup.c and
> pg_combinebackup.c. I think it is better to move this to 
> replication/basebackup.h.

Or some other header, but yeah, definitely don't duplicate the struct
definition (or any other kind of definition).

> IMHO, while labels are not advisable in general, it may be better to use a 
> label
> here rather than a while(1) loop, so that we can move to the label in case we
> want to retry once. I think here it opens doors for future bugs if someone
> happens to add code here, ending up adding some condition and then the
> break becomes conditional. That will leave us in an infinite loop.

I'm not sure which style is better here, but I don't really buy this argument.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-08-09 Thread Robert Haas

On Wed, Aug 7, 2019 at 5:46 AM Jeevan Chalke
 wrote:
> So, do you mean we should just do fread() and fwrite() for the whole file?
>
> I thought it is better if it was done by the OS itself instead of reading 1GB
> into the memory and writing the same to the file.

Well, 'cp' is just a C program.  If they can write code to copy a
file, so can we, and then we're not dependent on 'cp' being installed,
working properly, being in the user's path or at the hard-coded
pathname we expect, etc.  There's an existing copy_file() function in
src/backed/storage/file/copydir.c which I'd probably look into
adapting for frontend use.  I'm not sure whether it would be important
to adapt the data-flushing code that's present in that routine or
whether we could get by with just the loop to read() and write() data.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-08-08 Thread Jeevan Ladhe

Hi Jeevan,

I have reviewed the backup part at code level and still looking into the
restore(combine) and functional part of it. But, here are my comments so
far:

The patches need rebase.

+   if (!XLogRecPtrIsInvalid(previous_lsn))
+   appendStringInfo(labelfile, "PREVIOUS WAL LOCATION: %X/%X\n",
+(uint32) (previous_lsn >> 32), (uint32)
previous_lsn);

May be we should rename to something like:
"INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP START
LOCATION"
to make it more intuitive?



+typedef struct

+{

+   uint32  magic;

+   pg_crc32c   checksum;

+   uint32  nblocks;

+   uint32  blocknumbers[FLEXIBLE_ARRAY_MEMBER];

+} partial_file_header;


File header structure is defined in both the files basebackup.c and
pg_combinebackup.c. I think it is better to move this to
replication/basebackup.h.



+   boolisrelfile = false;

I think we can avoid having flag isrelfile in sendFile().
Something like this:

if (startincrptr && OidIsValid(dboid) && looks_like_rel_name(filename))
{
//include the code here that is under "if (isrelfile)" block.
}
else
{
_tarWriteHeader(tarfilename, NULL, statbuf, false);
while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp))
> 0)
{
...
}
}



Also, having isrelfile as part of following condition:
{code}
+   while (!isrelfile &&
+  (cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len),
fp)) > 0)
{code}

is confusing, because even the relation files in full backup are going to be
backed up by this loop only, but still, the condition reads '(!isrelfile
&&...)'.



verify_page_checksum()
{
while(1)
{

break;
}
}

IMHO, while labels are not advisable in general, it may be better to use a
label
here rather than a while(1) loop, so that we can move to the label in case
we
want to retry once. I think here it opens doors for future bugs if someone
happens to add code here, ending up adding some condition and then the
break becomes conditional. That will leave us in an infinite loop.



+/* magic number in incremental backup's .partial file */
+#define INCREMENTAL_BACKUP_MAGIC   0x494E4352

Similar to structure partial_file_header, I think above macro can also be
moved
to basebackup.h instead of defining it twice.



In sendFile():

+   buf = (char *) malloc(RELSEG_SIZE * BLCKSZ);

I think this is a huge memory request (1GB) and may fail on busy/loaded
server at
times. We should check for failures of malloc, maybe throw some error on
getting ENOMEM as errno.



+   /* Perform incremenatl backup stuff here. */
+   if ((cnt = fread(buf, 1, Min(RELSEG_SIZE * BLCKSZ,
statbuf->st_size), fp)) > 0)
+   {

Here, should not we expect statbuf->st_size < (RELSEG_SIZE * BLCKSZ), and it
should be safe to read just statbuf_st_size always I guess? But, I am ok
with
having this extra guard here.



In sendFile(), I am sorry if I am missing something, but I am not able to
understand why 'cnt' and 'i' should have different values when they are
being
passed to verify_page_checksum(). I think passing only one of them should be
sufficient.



+   XLogRecPtr  pglsn;
+
+   for (i = 0; i < cnt / BLCKSZ; i++)
+   {

Maybe we should just have a variable no_of_blocks to store a number of
blocks,
rather than calculating this say RELSEG_SIZE(i.e. 131072) times in the worst
case.


+   len += cnt;
+   throttle(cnt);
+   }

Sorry if I am missing something, but, should not it be just:

len = cnt;



As I said earlier in my previous email, we now do not need
+decode_lsn_internal()
as it is already taken care by the introduction of function
pg_lsn_in_internal().

Regards,
Jeevan Ladhe

Re: block-level incremental backup

2019-08-07 Thread Ibrar Ahmed

On Wed, Aug 7, 2019 at 2:47 PM Jeevan Chalke 
wrote:

>
>
> On Mon, Aug 5, 2019 at 7:13 PM Robert Haas  wrote:
>
>> On Fri, Aug 2, 2019 at 9:13 AM vignesh C  wrote:
>> > + rc = system(copycmd);
>>
>> I don't think this patch should be calling system() in the first place.
>>
>
> So, do you mean we should just do fread() and fwrite() for the whole file?
>
> I thought it is better if it was done by the OS itself instead of reading
> 1GB
> into the memory and writing the same to the file.
>
> It is not necessary to read the whole 1GB into Ram.


>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>
> --
> Jeevan Chalke
> Technical Architect, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
>

-- 
Ibrar Ahmed

Re: block-level incremental backup

2019-08-07 Thread Jeevan Chalke

On Mon, Aug 5, 2019 at 7:13 PM Robert Haas  wrote:

> On Fri, Aug 2, 2019 at 9:13 AM vignesh C  wrote:
> > + rc = system(copycmd);
>
> I don't think this patch should be calling system() in the first place.
>

So, do you mean we should just do fread() and fwrite() for the whole file?

I thought it is better if it was done by the OS itself instead of reading
1GB
into the memory and writing the same to the file.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-08-06 Thread Ibrar Ahmed

On Tue, Aug 6, 2019 at 11:31 PM Ibrar Ahmed  wrote:

>
> I have not looked at the patch in detail, but just some nits from my side.
>
> On Fri, Aug 2, 2019 at 6:13 PM vignesh C  wrote:
>
>> On Thu, Aug 1, 2019 at 5:06 PM Jeevan Chalke
>>  wrote:
>> >
>> > On Tue, Jul 30, 2019 at 9:39 AM Jeevan Chalke <
>> jeevan.cha...@enterprisedb.com> wrote:
>> >>
>> >> I am almost done writing the patch for pg_combinebackup and will post
>> soon.
>> >
>> >
>> > Attached patch which implements the pg_combinebackup utility used to
>> combine
>> > full basebackup with one or more incremental backups.
>> >
>> > I have tested it manually and it works for all best cases.
>> >
>> > Let me know if you have any inputs/suggestions/review comments?
>> >
>> Some comments:
>> 1) There will be some link files created for tablespace, we might
>> require some special handling for it
>>
>> 2)
>> + while (numretries <= maxretries)
>> + {
>> + rc = system(copycmd);
>> + if (rc == 0)
>> + return;
>>
>> Use API to copy the file instead of "system", better to use the secure
> copy.
>
Ah, it is a local copy, simple copy API is enough.

>
>
>> + pg_log_info("could not copy, retrying after %d seconds",
>> + sleeptime);
>> + pg_usleep(numretries++ * sleeptime * 100L);
>> + }
>> Retry functionality is hanlded only for copying of full files, should
>> we handle retry for copying of partial files
>>
>> The log and the sleep time does not match, you are multiplying sleeptime
> with numretries++ and logging only "sleeptime"
>
> Why we are retiring here, capture proper copy error and act accordingly.
> Blindly retiring does not make sense.
>
> 3)
>> + maxretries = atoi(optarg);
>> + if (maxretries < 0)
>> + {
>> + pg_log_error("invalid value for maxretries");
>> + fprintf(stderr, _("%s: -r maxretries must be >= 0\n"), progname);
>> + exit(1);
>> + }
>> + break;
>> + case 's':
>> + sleeptime = atoi(optarg);
>> + if (sleeptime <= 0 || sleeptime > 60)
>> + {
>> + pg_log_error("invalid value for sleeptime");
>> + fprintf(stderr, _("%s: -s sleeptime must be between 1 and 60\n"),
>> progname);
>> + exit(1);
>> + }
>> + break;
>> we can have some range for maxretries similar to sleeptime
>>
>> 4)
>> + fp = fopen(filename, "r");
>> + if (fp == NULL)
>> + {
>> + pg_log_error("could not read file \"%s\": %m", filename);
>> + exit(1);
>> + }
>> +
>> + labelfile = malloc(statbuf.st_size + 1);
>> + if (fread(labelfile, 1, statbuf.st_size, fp) != statbuf.st_size)
>> + {
>> + pg_log_error("corrupted file \"%s\": %m", filename);
>> + free(labelfile);
>> + exit(1);
>> + }
>> Should we check for malloc failure
>>
>> Use pg_malloc instead of malloc
>
>
>> 5) Should we add display of progress as backup may take some time,
>> this can be added as enhancement. We can get other's opinion on this.
>>
>> Yes, we should, but this is not the right time to do that.
>
>
>> 6)
>> + if (nIncrDir == MAX_INCR_BK_COUNT)
>> + {
>> + pg_log_error("too many incremental backups to combine");
>> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
>> progname);
>> + exit(1);
>> + }
>> +
>> + IncrDirs[nIncrDir] = optarg;
>> + nIncrDir++;
>> + break;
>>
>> If the backup count increases providing the input may be difficult,
>> Shall user provide all the incremental backups from a parent folder
>> and can we handle the ordering of incremental backup internally
>>
>> Why we have that limit at first place?
>
>
>> 7)
>> + if (isPartialFile)
>> + {
>> + if (verbose)
>> + pg_log_info("combining partial file \"%s.partial\"", fn);
>> +
>> + combine_partial_files(fn, IncrDirs, nIncrDir, subdirpath, outfn);
>> + }
>> + else
>> + copy_whole_file(infn, outfn);
>>
>> Add verbose for copying whole file
>>
>> 8) We can also check if approximate space is available in disk before
>> starting combine backup, this can be added as enhancement. We can get
>> other's opinion on this.
>>
>> 9)
>> + printf(_("  -i, --incr-backup=DIRECTORY incremental backup directory
>> (maximum %d)\n"), MAX_INCR_BK_COUNT);
>> + printf(_("  -o, --output-dir=DIRECTORY  combine backup into
>> directory\n"));
>> + printf(_("\nGeneral options:\n"));
>> + printf(_("  -n, --no-clean  do not clean up after
>> errors\n"));
>>
>> Combine backup into directory can be combine backup directory
>>
>> 10)
>> +/* Max number of incremental backups to be combined. */
>> +#define MAX_INCR_BK_COUNT 10
>> +
>> +/* magic number in incremental backup's .partial file */
>>
>> MAX_INCR_BK_COUNT can be increased little, some applications use 1
>> full backup at the beginning of the month and use 30 incremental
>> backups rest of the days in the month
>>
>> Regards,
>> Vignesh
>> EnterpriseDB: http://www.enterprisedb.com
>>
>>
>>
>
> --
> Ibrar Ahmed
>


-- 
Ibrar Ahmed

Re: block-level incremental backup

2019-08-06 Thread Ibrar Ahmed

I have not looked at the patch in detail, but just some nits from my side.

On Fri, Aug 2, 2019 at 6:13 PM vignesh C  wrote:

> On Thu, Aug 1, 2019 at 5:06 PM Jeevan Chalke
>  wrote:
> >
> > On Tue, Jul 30, 2019 at 9:39 AM Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote:
> >>
> >> I am almost done writing the patch for pg_combinebackup and will post
> soon.
> >
> >
> > Attached patch which implements the pg_combinebackup utility used to
> combine
> > full basebackup with one or more incremental backups.
> >
> > I have tested it manually and it works for all best cases.
> >
> > Let me know if you have any inputs/suggestions/review comments?
> >
> Some comments:
> 1) There will be some link files created for tablespace, we might
> require some special handling for it
>
> 2)
> + while (numretries <= maxretries)
> + {
> + rc = system(copycmd);
> + if (rc == 0)
> + return;
>
> Use API to copy the file instead of "system", better to use the secure
copy.


> + pg_log_info("could not copy, retrying after %d seconds",
> + sleeptime);
> + pg_usleep(numretries++ * sleeptime * 100L);
> + }
> Retry functionality is hanlded only for copying of full files, should
> we handle retry for copying of partial files
>
> The log and the sleep time does not match, you are multiplying sleeptime
with numretries++ and logging only "sleeptime"

Why we are retiring here, capture proper copy error and act accordingly.
Blindly retiring does not make sense.

3)
> + maxretries = atoi(optarg);
> + if (maxretries < 0)
> + {
> + pg_log_error("invalid value for maxretries");
> + fprintf(stderr, _("%s: -r maxretries must be >= 0\n"), progname);
> + exit(1);
> + }
> + break;
> + case 's':
> + sleeptime = atoi(optarg);
> + if (sleeptime <= 0 || sleeptime > 60)
> + {
> + pg_log_error("invalid value for sleeptime");
> + fprintf(stderr, _("%s: -s sleeptime must be between 1 and 60\n"),
> progname);
> + exit(1);
> + }
> + break;
> we can have some range for maxretries similar to sleeptime
>
> 4)
> + fp = fopen(filename, "r");
> + if (fp == NULL)
> + {
> + pg_log_error("could not read file \"%s\": %m", filename);
> + exit(1);
> + }
> +
> + labelfile = malloc(statbuf.st_size + 1);
> + if (fread(labelfile, 1, statbuf.st_size, fp) != statbuf.st_size)
> + {
> + pg_log_error("corrupted file \"%s\": %m", filename);
> + free(labelfile);
> + exit(1);
> + }
> Should we check for malloc failure
>
> Use pg_malloc instead of malloc


> 5) Should we add display of progress as backup may take some time,
> this can be added as enhancement. We can get other's opinion on this.
>
> Yes, we should, but this is not the right time to do that.


> 6)
> + if (nIncrDir == MAX_INCR_BK_COUNT)
> + {
> + pg_log_error("too many incremental backups to combine");
> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
> progname);
> + exit(1);
> + }
> +
> + IncrDirs[nIncrDir] = optarg;
> + nIncrDir++;
> + break;
>
> If the backup count increases providing the input may be difficult,
> Shall user provide all the incremental backups from a parent folder
> and can we handle the ordering of incremental backup internally
>
> Why we have that limit at first place?


> 7)
> + if (isPartialFile)
> + {
> + if (verbose)
> + pg_log_info("combining partial file \"%s.partial\"", fn);
> +
> + combine_partial_files(fn, IncrDirs, nIncrDir, subdirpath, outfn);
> + }
> + else
> + copy_whole_file(infn, outfn);
>
> Add verbose for copying whole file
>
> 8) We can also check if approximate space is available in disk before
> starting combine backup, this can be added as enhancement. We can get
> other's opinion on this.
>
> 9)
> + printf(_("  -i, --incr-backup=DIRECTORY incremental backup directory
> (maximum %d)\n"), MAX_INCR_BK_COUNT);
> + printf(_("  -o, --output-dir=DIRECTORY  combine backup into
> directory\n"));
> + printf(_("\nGeneral options:\n"));
> + printf(_("  -n, --no-clean  do not clean up after
> errors\n"));
>
> Combine backup into directory can be combine backup directory
>
> 10)
> +/* Max number of incremental backups to be combined. */
> +#define MAX_INCR_BK_COUNT 10
> +
> +/* magic number in incremental backup's .partial file */
>
> MAX_INCR_BK_COUNT can be increased little, some applications use 1
> full backup at the beginning of the month and use 30 incremental
> backups rest of the days in the month
>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
>
>
>

-- 
Ibrar Ahmed

Re: block-level incremental backup

2019-08-05 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Aug 2, 2019 at 9:13 AM vignesh C  wrote:
> > + rc = system(copycmd);
> 
> I don't think this patch should be calling system() in the first place.

+1.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: block-level incremental backup

2019-08-05 Thread Robert Haas

On Fri, Aug 2, 2019 at 9:13 AM vignesh C  wrote:
> + rc = system(copycmd);

I don't think this patch should be calling system() in the first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-08-02 Thread vignesh C

On Thu, Aug 1, 2019 at 5:06 PM Jeevan Chalke
 wrote:
>
> On Tue, Jul 30, 2019 at 9:39 AM Jeevan Chalke 
>  wrote:
>>
>> I am almost done writing the patch for pg_combinebackup and will post soon.
>
>
> Attached patch which implements the pg_combinebackup utility used to combine
> full basebackup with one or more incremental backups.
>
> I have tested it manually and it works for all best cases.
>
> Let me know if you have any inputs/suggestions/review comments?
>
Some comments:
1) There will be some link files created for tablespace, we might
require some special handling for it

2)
+ while (numretries <= maxretries)
+ {
+ rc = system(copycmd);
+ if (rc == 0)
+ return;
+
+ pg_log_info("could not copy, retrying after %d seconds",
+ sleeptime);
+ pg_usleep(numretries++ * sleeptime * 100L);
+ }
Retry functionality is hanlded only for copying of full files, should
we handle retry for copying of partial files

3)
+ maxretries = atoi(optarg);
+ if (maxretries < 0)
+ {
+ pg_log_error("invalid value for maxretries");
+ fprintf(stderr, _("%s: -r maxretries must be >= 0\n"), progname);
+ exit(1);
+ }
+ break;
+ case 's':
+ sleeptime = atoi(optarg);
+ if (sleeptime <= 0 || sleeptime > 60)
+ {
+ pg_log_error("invalid value for sleeptime");
+ fprintf(stderr, _("%s: -s sleeptime must be between 1 and 60\n"), progname);
+ exit(1);
+ }
+ break;
we can have some range for maxretries similar to sleeptime

4)
+ fp = fopen(filename, "r");
+ if (fp == NULL)
+ {
+ pg_log_error("could not read file \"%s\": %m", filename);
+ exit(1);
+ }
+
+ labelfile = malloc(statbuf.st_size + 1);
+ if (fread(labelfile, 1, statbuf.st_size, fp) != statbuf.st_size)
+ {
+ pg_log_error("corrupted file \"%s\": %m", filename);
+ free(labelfile);
+ exit(1);
+ }
Should we check for malloc failure

5) Should we add display of progress as backup may take some time,
this can be added as enhancement. We can get other's opinion on this.

6)
+ if (nIncrDir == MAX_INCR_BK_COUNT)
+ {
+ pg_log_error("too many incremental backups to combine");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+ exit(1);
+ }
+
+ IncrDirs[nIncrDir] = optarg;
+ nIncrDir++;
+ break;

If the backup count increases providing the input may be difficult,
Shall user provide all the incremental backups from a parent folder
and can we handle the ordering of incremental backup internally

7)
+ if (isPartialFile)
+ {
+ if (verbose)
+ pg_log_info("combining partial file \"%s.partial\"", fn);
+
+ combine_partial_files(fn, IncrDirs, nIncrDir, subdirpath, outfn);
+ }
+ else
+ copy_whole_file(infn, outfn);

Add verbose for copying whole file

8) We can also check if approximate space is available in disk before
starting combine backup, this can be added as enhancement. We can get
other's opinion on this.

9)
+ printf(_("  -i, --incr-backup=DIRECTORY incremental backup directory
(maximum %d)\n"), MAX_INCR_BK_COUNT);
+ printf(_("  -o, --output-dir=DIRECTORY  combine backup into directory\n"));
+ printf(_("\nGeneral options:\n"));
+ printf(_("  -n, --no-clean  do not clean up after errors\n"));

Combine backup into directory can be combine backup directory

10)
+/* Max number of incremental backups to be combined. */
+#define MAX_INCR_BK_COUNT 10
+
+/* magic number in incremental backup's .partial file */

MAX_INCR_BK_COUNT can be increased little, some applications use 1
full backup at the beginning of the month and use 30 incremental
backups rest of the days in the month

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: block-level incremental backup

2019-08-01 Thread Jeevan Chalke

On Tue, Jul 30, 2019 at 9:39 AM Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
>
>
> I am almost done writing the patch for pg_combinebackup and will post soon.
>

Attached patch which implements the pg_combinebackup utility used to combine
full basebackup with one or more incremental backups.

I have tested it manually and it works for all best cases.

Let me know if you have any inputs/suggestions/review comments?

Thanks
-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8d91f35..f3e90b6 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -200,6 +200,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 00782e0..92d9d13 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -415,7 +415,7 @@ PostgreSQL documentation
 which are modified after this given LSN will be backed up. The file
 which has these partial blocks has .partial as an extension. Backup
 taken in this manner has to be combined with the full backup with the
-pg_combinebackup utility.
+ utility.

   
  
diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml
new file mode 100644
index 000..ed87931
--- /dev/null
+++ b/doc/src/sgml/ref/pg_combinebackup.sgml
@@ -0,0 +1,202 @@
+
+
+
+ 
+  pg_combinebackup
+ 
+
+ 
+  pg_combinebackup
+  1
+  Application
+ 
+
+ 
+  pg_combinebackup
+  create a synthetic backup from a full backup and one or more incremental backups
+ 
+
+ 
+  
+   pg_combinebackup
+   option
+  
+ 
+
+ 
+  Description
+  
+   pg_combinebackup combines one or more incremental
+   backups with the full base-backup to generate a synthetic backup.
+  
+ 
+
+ 
+  Options
+
+   
+The following command-line options are available:
+
+
+ 
+  -f directory
+  --full-backup=directory
+  
+   
+Specifies the directory where the full backup is stored.
+   
+  
+ 
+
+ 
+  -i directory
+  --incr-backup=directory
+  
+   
+Specifies the directory where the incremental backup is stored.
+   
+  
+ 
+
+ 
+  -o directory
+  --output-backup=directory
+  
+   
+Specifies the output directory where the combined full synthetic backup
+to be stored.
+   
+  
+ 
+
+ 
+  -n
+  --no-clean
+  
+   
+By default, when pg_combinebackup aborts with an
+error, it removes the output data directories it might have created
+before discovering that it cannot finish the job. This option inhibits
+tidying-up and is thus useful for debugging.
+   
+  
+ 
+
+ 
+  -r max-retries
+  
+   
+Max number of retries on copy command, with progressive wait.
+Default is 3.
+   
+  
+ 
+
+ 
+  -s interval
+  
+   
+Sleep interval before retry (in seconds).
+Should be in between 1 and 60, default is 5.
+   
+  
+ 
+
+ 
+  -v
+  --verbose
+  
+   
+Enable verbose output. Lists all partial files processed and its
+checksum status.
+   
+  
+ 
+
+ 
+   -V
+   --version
+   
+   
+Print the pg_combinebackup version and exit.
+   
+   
+ 
+
+ 
+  -?
+  --help
+   
+
+ Show help about pg_combinebackup command line
+ arguments, and exit.
+
+   
+  
+
+   
+ 
+
+ 
+  Environment
+  
+   
+PG_COLOR
+
+ 
+  Specifies whether to use color in diagnostics messages.  Possible values
+  are always, auto,
+  never.
+ 
+
+   
+  
+ 
+
+ 
+  Notes
+  
+   Output directory, full backup directory, and at-least one incremental backup
+   directory must be specified.
+  
+
+  
+   PREVIOUS WAL LOCATION of the incremental backup must
+   match with the START WAL LOCATION of the previous full
+   or incremental backup in a given sequence.
+  
+ 
+
+ 
+  Examples
+
+  
+   To combine a full backup with two incremental backups and store it in the
+   output directory:
+
+$ pg_combinebackup -f /data/full/data -i /data/incr/data1 -i /data/incr/data2 -o /data/full/fulldata
+
+  
+
+  
+   To combine a full backup with an incremental backups and store it in the
+   output directory along with verious options like, verbose, no-clean, and
+   maximum 3 retries: 
+
+$ pg_combinebackup -v --no-clean -r 3 -f /data/full/data -i /data/incr/data1 -o /data/full/fulldata
+
+  
+ 
+
+ 
+  See Also
+
+  
+   
+  
+ 
+
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index cef09dd..3513ab4 100644
--- a/do

Re: block-level incremental backup

2019-07-31 Thread Robert Haas

On Wed, Jul 31, 2019 at 1:59 PM vignesh C  wrote:
> I feel Robert's suggestion is good.
> We can probably keep one meta file for each backup with some basic information
> of all the files being backed up, this metadata file will be useful in the
> below case:
> Table dropped before incremental backup
> Table truncated and Insert/Update/Delete operations before incremental backup

There's really no need for this with the design I proposed.  The files
that should exist when you restore in incremental backup are exactly
the set of files that exist in the final incremental backup, except
that any .partial files need to be replaced with a correct
reconstruction of the underlying file.  You don't need to know what
got dropped or truncated; you only need to know what's supposed to be
there at the end.

You may be thinking, as I once did, that restoring an incremental
backup would consist of restoring the full backup first and then
layering the incrementals over it, but if you read what I proposed, it
actually works the other way around: you restore the files that are
present in the incremental, and as needed, pull pieces of them from
earlier incremental and/or full backups.  I think this is a *much*
better design than doing it the other way; it avoids any risk of
getting the wrong answer due to truncations or drops, and it also is
faster, because you only read older backups to the extent that you
actually need their contents.

I think it's a good idea to try to keep all the information about a
single file being backup in one place. It's just less confusing.  If,
for example, you have a metadata file that tells you which files are
dropped - that is, which files you DON'T have - then what happen if
one of those files is present in the data directory after all?  Well,
then you have inconsistent information and are confused, and maybe
your code won't even notice the inconsistency.  Similarly, if the
metadata file is separate from the block data, then what happens if
one file is missing, or isn't from the same backup as the other file?
That shouldn't happen, of course, but if it does, you'll get confused.
There's no perfect solution to these kinds of problems: if we suppose
that the backup can be corrupted by having missing or extra files, why
not also corruption within a single file? Still, on balance I tend to
think that keeping related stuff together minimizes the surface area
for bugs.  I realize that's arguable, though.

One consideration that goes the other way: if you have a manifest file
that says what files are supposed to be present in the backup, then
you can detect a disappearing file, which is impossible with the
design I've proposed (and with the current full backup machinery).
That might be worth fixing, but it's a separate feature that has
little to do with incremental backup.

> Probably it can also help us to decide which work the worker needs to do
> if we are planning to backup in parallel.

I don't think we need a manifest file for parallel backup.  One
process or thread can scan the directory tree, make a list of which
files are present, and then hand individual files off to other
processes or threads. In short, the directory listing serves as the
manifest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-07-31 Thread vignesh C

On Tue, Jul 30, 2019 at 1:58 AM Robert Haas  wrote:
>
> On Wed, Jul 10, 2019 at 2:17 PM Anastasia Lubennikova
>  wrote:
> > In attachments, you can find a prototype of incremental pg_basebackup,
> > which consists of 2 features:
> >
> > 1) To perform incremental backup one should call pg_basebackup with a
> > new argument:
> >
> > pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn'
> >
> > where lsn is a start_lsn of parent backup (can be found in
> > "backup_label" file)
> >
> > It calls BASE_BACKUP replication command with a new argument
> > PREV_BACKUP_START_LSN 'lsn'.
> >
> > For datafiles, only pages with LSN > prev_backup_start_lsn will be
> > included in the backup.
>>
One thought, if the file is not modified no need to check the lsn.
>>
> > They are saved into 'filename.partial' file, 'filename.blockmap' file
> > contains an array of BlockNumbers.
> > For example, if we backuped blocks 1,3,5, filename.partial will contain
> > 3 blocks, and 'filename.blockmap' will contain array {1,3,5}.
>
> I think it's better to keep both the information about changed blocks
> and the contents of the changed blocks in a single file.  The list of
> changed blocks is probably quite short, and I don't really want to
> double the number of files in the backup if there's no real need. I
> suspect it's just overall a bit simpler to keep everything together.
> I don't think this is a make-or-break thing, and welcome contrary
> arguments, but that's my preference.
>
I feel Robert's suggestion is good.
We can probably keep one meta file for each backup with some basic information
of all the files being backed up, this metadata file will be useful in the
below case:
Table dropped before incremental backup
Table truncated and Insert/Update/Delete operations before incremental backup

I feel if we have the metadata, we can add some optimization to decide the
above scenario with the metadata information to identify the file deletion
and avoiding write and delete for pg_combinebackup which Jeevan has told in
his previous mail.

Probably it can also help us to decide which work the worker needs to do
if we are planning to backup in parallel.

Regards,
vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: block-level incremental backup

2019-07-30 Thread Ibrar Ahmed

On Tue, Jul 30, 2019 at 1:28 AM Robert Haas  wrote:

> On Wed, Jul 10, 2019 at 2:17 PM Anastasia Lubennikova
>  wrote:
> > In attachments, you can find a prototype of incremental pg_basebackup,
> > which consists of 2 features:
> >
> > 1) To perform incremental backup one should call pg_basebackup with a
> > new argument:
> >
> > pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn'
> >
> > where lsn is a start_lsn of parent backup (can be found in
> > "backup_label" file)
> >
> > It calls BASE_BACKUP replication command with a new argument
> > PREV_BACKUP_START_LSN 'lsn'.
> >
> > For datafiles, only pages with LSN > prev_backup_start_lsn will be
> > included in the backup.
> > They are saved into 'filename.partial' file, 'filename.blockmap' file
> > contains an array of BlockNumbers.
> > For example, if we backuped blocks 1,3,5, filename.partial will contain
> > 3 blocks, and 'filename.blockmap' will contain array {1,3,5}.
>
> I think it's better to keep both the information about changed blocks
> and the contents of the changed blocks in a single file.  The list of
> changed blocks is probably quite short, and I don't really want to
> double the number of files in the backup if there's no real need. I
> suspect it's just overall a bit simpler to keep everything together.
> I don't think this is a make-or-break thing, and welcome contrary
> arguments, but that's my preference.
>

I had experience working on a similar product and I agree with Robert to
keep
the changed block info and the changed block in a single file make more
sense.
+1

>
> > 2) To merge incremental backup into a full backup call
> >
> > pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir'
> > --merge-backups
> >
> > It will move all files from 'incremental_basedir' to 'basedir' handling
> > '.partial' files correctly.
>
> This, to me, looks like it's much worse than the design that I
> proposed originally.  It means that:
>
> 1. You can't take an incremental backup without having the full backup
> available at the time you want to take the incremental backup.
>
> 2. You're always storing a full backup, which means that you need more
> disk space, and potentially much more I/O while taking the backup.
> You save on transfer bandwidth, but you add a lot of disk reads and
> writes, costs which have to be paid even if the backup is never
> restored.
>
> > 1) Whether we collect block maps using simple "read everything page by
> > page" approach
> > or WAL scanning or any other page tracking algorithm, we must choose a
> > map format.
> > I implemented the simplest one, while there are more ideas:
>
> I think we should start simple.
>
> I haven't had a chance to look at Jeevan's patch at all, or yours in
> any detail, as yet, so these are just some very preliminary comments.
> It will be good, however, if we can agree on who is going to do what
> part of this as we try to drive this forward together.  I'm sorry that
> I didn't communicate EDB's plans to work on this more clearly;
> duplicated effort serves nobody well.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

-- 
Ibrar Ahmed

Re: block-level incremental backup

2019-07-29 Thread Jeevan Chalke

On Tue, Jul 30, 2019 at 1:58 AM Robert Haas  wrote:

>
> I haven't had a chance to look at Jeevan's patch at all, or yours in
> any detail, as yet, so these are just some very preliminary comments.
> It will be good, however, if we can agree on who is going to do what
> part of this as we try to drive this forward together.  I'm sorry that
> I didn't communicate EDB's plans to work on this more clearly;
> duplicated effort serves nobody well.
>

I had a look over Anastasia's PoC patch to understand the approach she has
taken and here are my observations.

1.
The patch first creates a .blockmap file for each relation file containing
an array of all modified block numbers. This is done by reading all blocks
(in a chunk of 4 (32kb in total) in a loop) from a file and checking the
page
LSN with given LSN. Later, to create .partial file, a relation file is
opened
again and all blocks are read in a chunk of 4 in a loop. If found modified,
it is copied into another memory and after scanning all 4 blocks, all copied
blocks are sent to the .partial file.

In this approach, each file is opened and read twice which looks more
expensive
to me. Whereas in my patch, I do that just once. However, I read the entire
file in memory to check which blocks are modified but in Anastasia's design
max TAR_SEND_SIZE (32kb) will be read at a time but, in a loop. I need to do
that as we wanted to know how heavily the file got modified so that we can
send the entire file if it was modified beyond the threshold (currently
90%).

2.
Also, while sending modified blocks, they are copied in another buffer,
instead
they can be just sent from the read files contents (in BLCKSZ block size).
Here, the .blockmap created earlier was not used. In my implementation, we
are
sending just a .partial file with a header containing all required details
like
the number of blocks changes along with the block numbers including CRC
followed by the blocks itself.

3.
I tried compiling Anastasia's patch, but getting an error. So could not see
or
test how it goes. Also, like a normal backup option, the incremental backup
option needs to verify the checksum if requested.

4.
While combining full and incremental backup, files from the incremental
backup
are just copied into the full backup directory. While the design I posted
earlier, we are trying another way round to avoid over-writing and other
issues
as I explained earlier.

I am almost done writing the patch for pg_combinebackup and will post soon.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>
Thanks
-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-07-29 Thread Jeevan Ladhe

Hi Jeevan


I reviewed first two patches -


0001-Add-support-for-command-line-option-to-pass-LSN.patch and

0002-Add-TAP-test-to-test-LSN-option.patch


from the set of incremental backup patches, and the changes look good to me.


I had some concerns around the way we are working around with the fact that

pg_lsn_in() accepts the lsn with 0 as a valid lsn and I think that itself is

contradictory to the definition of InvalidXLogRecPtr. I have started a
separate

new thread[1] for the same.


Also, I observe that now commit 21f428eb, has already moved the lsn decoding

logic to a separate function pg_lsn_in_internal(), so the function

decode_lsn_internal() from patch 0001 will go away and the dependent code
needs

to be modified.


I shall review the rest of the patches, and post the comments.


Regards,

Jeevan Ladhe


[1]
https://www.postgresql.org/message-id/CAOgcT0NOM9oR0Hag_3VpyW0uF3iCU=bdufspfk9jrwxrcwq...@mail.gmail.com

On Thu, Jul 11, 2019 at 5:00 PM Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

> Hi Anastasia,
>
> On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova <
> a.lubennik...@postgrespro.ru> wrote:
>
>> 23.04.2019 14:08, Anastasia Lubennikova wrote:
>> > I'm volunteering to write a draft patch or, more likely, set of
>> > patches, which
>> > will allow us to discuss the subject in more detail.
>> > And to do that I wish we agree on the API and data format (at least
>> > broadly).
>> > Looking forward to hearing your thoughts.
>>
>> Though the previous discussion stalled,
>> I still hope that we could agree on basic points such as a map file
>> format and protocol extension,
>> which is necessary to start implementing the feature.
>>
>
> It's great that you too come up with the PoC patch. I didn't look at your
> changes in much details but we at EnterpriseDB too working on this feature
> and started implementing it.
>
> Attached series of patches I had so far... (which needed further
> optimization and adjustments though)
>
> Here is the overall design (as proposed by Robert) we are trying to
> implement:
>
> 1. Extend the BASE_BACKUP command that can be used with replication
> connections. Add a new [ LSN 'lsn' ] option.
>
> 2. Extend pg_basebackup with a new --lsn=LSN option that causes it to send
> the option added to the server in #1.
>
> Here are the implementation details when we have a valid LSN
>
> sendFile() in basebackup.c is the function which mostly does the thing for
> us. If the filename looks like a relation file, then we'll need to consider
> sending only a partial file. The way to do that is probably:
>
> A. Read the whole file into memory.
>
> B. Check the LSN of each block. Build a bitmap indicating which blocks
> have an LSN greater than or equal to the threshold LSN.
>
> C. If more than 90% of the bits in the bitmap are set, send the whole file
> just as if this were a full backup. This 90% is a constant now; we might
> make it a GUC later.
>
> D. Otherwise, send a file with .partial added to the name. The .partial
> file contains an indication of which blocks were changed at the beginning,
> followed by the data blocks. It also includes a checksum/CRC.
> Currently, a .partial file format looks like:
>  - start with a 4-byte magic number
>  - then store a 4-byte CRC covering the header
>  - then a 4-byte count of the number of blocks included in the file
>  - then the block numbers, each as a 4-byte quantity
>  - then the data blocks
>
>
> We are also working on combining these incremental back-ups with the full
> backup and for that, we are planning to add a new utility called
> pg_combinebackup. Will post the details on that later once we have on the
> same page for taking backup.
>
> Thanks
> --
> Jeevan Chalke
> Technical Architect, Product Development
> EnterpriseDB Corporation
>
>

Re: block-level incremental backup

2019-07-29 Thread Robert Haas

On Wed, Jul 10, 2019 at 2:17 PM Anastasia Lubennikova
 wrote:
> In attachments, you can find a prototype of incremental pg_basebackup,
> which consists of 2 features:
>
> 1) To perform incremental backup one should call pg_basebackup with a
> new argument:
>
> pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn'
>
> where lsn is a start_lsn of parent backup (can be found in
> "backup_label" file)
>
> It calls BASE_BACKUP replication command with a new argument
> PREV_BACKUP_START_LSN 'lsn'.
>
> For datafiles, only pages with LSN > prev_backup_start_lsn will be
> included in the backup.
> They are saved into 'filename.partial' file, 'filename.blockmap' file
> contains an array of BlockNumbers.
> For example, if we backuped blocks 1,3,5, filename.partial will contain
> 3 blocks, and 'filename.blockmap' will contain array {1,3,5}.

I think it's better to keep both the information about changed blocks
and the contents of the changed blocks in a single file.  The list of
changed blocks is probably quite short, and I don't really want to
double the number of files in the backup if there's no real need. I
suspect it's just overall a bit simpler to keep everything together.
I don't think this is a make-or-break thing, and welcome contrary
arguments, but that's my preference.

> 2) To merge incremental backup into a full backup call
>
> pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir'
> --merge-backups
>
> It will move all files from 'incremental_basedir' to 'basedir' handling
> '.partial' files correctly.

This, to me, looks like it's much worse than the design that I
proposed originally.  It means that:

1. You can't take an incremental backup without having the full backup
available at the time you want to take the incremental backup.

2. You're always storing a full backup, which means that you need more
disk space, and potentially much more I/O while taking the backup.
You save on transfer bandwidth, but you add a lot of disk reads and
writes, costs which have to be paid even if the backup is never
restored.

> 1) Whether we collect block maps using simple "read everything page by
> page" approach
> or WAL scanning or any other page tracking algorithm, we must choose a
> map format.
> I implemented the simplest one, while there are more ideas:

I think we should start simple.

I haven't had a chance to look at Jeevan's patch at all, or yours in
any detail, as yet, so these are just some very preliminary comments.
It will be good, however, if we can agree on who is going to do what
part of this as we try to drive this forward together.  I'm sorry that
I didn't communicate EDB's plans to work on this more clearly;
duplicated effort serves nobody well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-07-26 Thread vignesh C

On Fri, Jul 26, 2019 at 11:21 AM Jeevan Ladhe 
wrote:

> Hi Vignesh,
>
> Please find my comments inline below:
>
> 1) If relation file has changed due to truncate or vacuum.
>> During incremental backup the new files will be copied.
>> There are chances that both the old  file and new file
>> will be present. I'm not sure if cleaning up of the
>> old file is handled.
>>
>
> When an incremental backup is taken it either copies the file in its
> entirety if
> a file is changed more than 90%, or writes .partial with changed blocks
> bitmap
> and actual data. For the files that are unchanged, it writes 0 bytes and
> still
> creates a .partial file for unchanged files too. This means there is a
> .partitial
> file for all the files that are to be looked up in full backup.
> While composing a synthetic backup from incremental backup the
> pg_combinebackup
> tool will only look for those relation files in full(parent) backup which
> are
> having .partial files in the incremental backup. So, if vacuum/truncate
> happened
> between full and incremental backup, then the incremental backup image
> will not
> have a 0-length .partial file for that relation, and so the synthetic
> backup
> that is restored using pg_combinebackup will not have that file as well.
>
>
Thanks Jeevan for the update, I feel this logic is good.
It will handle the case of deleting the old relation files.

>
>
>> 2) Just a small thought on building the bitmap,
>> can the bitmap be built and maintained as
>> and when the changes are happening in the system.
>> If we are building the bitmap while doing the incremental backup,
>> Scanning through each file might take more time.
>> This can be a configurable parameter, the system can run
>> without capturing this information by default, but if there are some
>> of them who will be taking incremental backup frequently this
>> configuration can be enabled which should track the modified blocks.
>
>
> IIUC, this will need changes in the backend. Honestly, I think backup is a
> maintenance task and hampering the backend for this does not look like a
> good
> idea. But, having said that even if we have to provide this as a switch
> for some
> of the users, it will need a different infrastructure than what we are
> building
> here for constructing bitmap, where we scan all the files one by one.
> Maybe for
> the initial version, we can go with the current proposal that Robert has
> suggested,
> and add this switch at a later point as an enhancement.
>
>
That sounds fair to me.


Regards,
vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: block-level incremental backup

2019-07-25 Thread Jeevan Ladhe

Hi Vignesh,

Please find my comments inline below:

1) If relation file has changed due to truncate or vacuum.
> During incremental backup the new files will be copied.
> There are chances that both the old  file and new file
> will be present. I'm not sure if cleaning up of the
> old file is handled.
>

When an incremental backup is taken it either copies the file in its
entirety if
a file is changed more than 90%, or writes .partial with changed blocks
bitmap
and actual data. For the files that are unchanged, it writes 0 bytes and
still
creates a .partial file for unchanged files too. This means there is a
.partitial
file for all the files that are to be looked up in full backup.
While composing a synthetic backup from incremental backup the
pg_combinebackup
tool will only look for those relation files in full(parent) backup which
are
having .partial files in the incremental backup. So, if vacuum/truncate
happened
between full and incremental backup, then the incremental backup image will
not
have a 0-length .partial file for that relation, and so the synthetic backup
that is restored using pg_combinebackup will not have that file as well.


> 2) Just a small thought on building the bitmap,
> can the bitmap be built and maintained as
> and when the changes are happening in the system.
> If we are building the bitmap while doing the incremental backup,
> Scanning through each file might take more time.
> This can be a configurable parameter, the system can run
> without capturing this information by default, but if there are some
> of them who will be taking incremental backup frequently this
> configuration can be enabled which should track the modified blocks.


IIUC, this will need changes in the backend. Honestly, I think backup is a
maintenance task and hampering the backend for this does not look like a
good
idea. But, having said that even if we have to provide this as a switch for
some
of the users, it will need a different infrastructure than what we are
building
here for constructing bitmap, where we scan all the files one by one. Maybe
for
the initial version, we can go with the current proposal that Robert has
suggested,
and add this switch at a later point as an enhancement.
- My thoughts.

Regards,
Jeevan Ladhe

Re: block-level incremental backup

2019-07-23 Thread vignesh C

Thanks Jeevan.

1) If relation file has changed due to truncate or vacuum.
During incremental backup the new files will be copied.
There are chances that both the old  file and new file
will be present. I'm not sure if cleaning up of the
old file is handled.
2) Just a small thought on building the bitmap,
can the bitmap be built and maintained as
and when the changes are happening in the system.
If we are building the bitmap while doing the incremental backup,
Scanning through each file might take more time.
This can be a configurable parameter, the system can run
without capturing this information by default, but if there are some
of them who will be taking incremental backup frequently this
configuration can be enabled which should track the modified blocks.

What is your thought on this?
-- 
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


On Tue, Jul 23, 2019 at 11:19 PM Jeevan Ladhe
 wrote:
>
> Hi Vignesh,
>
> This backup technology is extending the pg_basebackup itself, which means we 
> can
> still take online backups. This is internally done using pg_start_backup and
> pg_stop_backup. pg_start_backup performs a checkpoint, and this checkpoint is
> used in the recovery process while starting the cluster from a backup image. 
> What
> incremental backup will just modify (as compared to traditional pg_basebackup)
> is - After doing the checkpoint, instead of copying the entire relation files,
> it takes an input LSN and scan all the blocks in all relation files, and store
> the blocks having LSN >= InputLSN. This means it considers all the changes
> that are already written into relation files including insert/update/delete 
> etc
> up to the checkpoint performed by pg_start_backup internally, and as Jeevan 
> Chalke
> mentioned upthread the incremental backup will also contain copy of WAL files.
> Once this incremental backup is combined with the parent backup by means of 
> new
> combine process (that will be introduced as part of this feature itself) 
> should
> ideally look like a full pg_basebackup. Note that any changes done by these
> insert/delete/update operations while the incremental backup was being taken
> will be still available via WAL files and as normal restore process, will be
> replayed from the checkpoint onwards up to a consistent point.
>
> My two cents!
>
> Regards,
> Jeevan Ladhe
>
> On Sat, Jul 20, 2019 at 11:22 PM vignesh C  wrote:
>>
>> Hi Jeevan,
>>
>> The idea is very nice.
>> When Insert/update/delete and truncate/drop happens at various
>> combinations, How the incremental backup handles the copying of the
>> blocks?
>>
>>
>> On Wed, Jul 17, 2019 at 8:12 PM Jeevan Chalke
>>  wrote:
>> >
>> >
>> >
>> > On Wed, Jul 17, 2019 at 7:38 PM Ibrar Ahmed  wrote:
>> >>
>> >>
>> >>
>> >> On Wed, Jul 17, 2019 at 6:43 PM Jeevan Chalke 
>> >>  wrote:
>> >>>
>> >>> On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed  
>> >>> wrote:
>> 
>> 
>>  At what stage you will apply the WAL generated in between the 
>>  START/STOP backup.
>> >>>
>> >>>
>> >>> In this design, we are not touching any WAL related code. The WAL files 
>> >>> will
>> >>> get copied with each backup either full or incremental. And thus, the 
>> >>> last
>> >>> incremental backup will have the final WAL files which will be copied 
>> >>> as-is
>> >>> in the combined full-backup and they will get apply automatically if that
>> >>> the data directory is used to start the server.
>> >>
>> >>
>> >> Ok, so you keep all the WAL files since the first backup, right?
>> >
>> >
>> > The WAL files will anyway be copied while taking a backup (full or 
>> > incremental),
>> > but only last incremental backup's WAL files are copied to the combined
>> > synthetic full backup.
>> >
>> >>>
>> 
>>  --
>>  Ibrar Ahmed
>> >>>
>> >>>
>> >>> --
>> >>> Jeevan Chalke
>> >>> Technical Architect, Product Development
>> >>> EnterpriseDB Corporation
>> >>>
>> >>
>> >>
>> >> --
>> >> Ibrar Ahmed
>> >
>> >
>> >
>> > --
>> > Jeevan Chalke
>> > Technical Architect, Product Development
>> > EnterpriseDB Corporation
>> >
>>
>>
>> --
>> Regards,
>> vignesh
>>
>>
>>

Re: block-level incremental backup

2019-07-23 Thread Jeevan Ladhe

Hi Vignesh,

This backup technology is extending the pg_basebackup itself, which means
we can
still take online backups. This is internally done using pg_start_backup and
pg_stop_backup. pg_start_backup performs a checkpoint, and this checkpoint
is
used in the recovery process while starting the cluster from a backup
image. What
incremental backup will just modify (as compared to traditional
pg_basebackup)
is - After doing the checkpoint, instead of copying the entire relation
files,
it takes an input LSN and scan all the blocks in all relation files, and
store
the blocks having LSN >= InputLSN. This means it considers all the changes
that are already written into relation files including insert/update/delete
etc
up to the checkpoint performed by pg_start_backup internally, and as Jeevan
Chalke
mentioned upthread the incremental backup will also contain copy of WAL
files.
Once this incremental backup is combined with the parent backup by means of
new
combine process (that will be introduced as part of this feature itself)
should
ideally look like a full pg_basebackup. Note that any changes done by these
insert/delete/update operations while the incremental backup was being taken
will be still available via WAL files and as normal restore process, will be
replayed from the checkpoint onwards up to a consistent point.

My two cents!

Regards,
Jeevan Ladhe

On Sat, Jul 20, 2019 at 11:22 PM vignesh C  wrote:

> Hi Jeevan,
>
> The idea is very nice.
> When Insert/update/delete and truncate/drop happens at various
> combinations, How the incremental backup handles the copying of the
> blocks?
>
>
> On Wed, Jul 17, 2019 at 8:12 PM Jeevan Chalke
>  wrote:
> >
> >
> >
> > On Wed, Jul 17, 2019 at 7:38 PM Ibrar Ahmed 
> wrote:
> >>
> >>
> >>
> >> On Wed, Jul 17, 2019 at 6:43 PM Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote:
> >>>
> >>> On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed 
> wrote:
> 
> 
>  At what stage you will apply the WAL generated in between the
> START/STOP backup.
> >>>
> >>>
> >>> In this design, we are not touching any WAL related code. The WAL
> files will
> >>> get copied with each backup either full or incremental. And thus, the
> last
> >>> incremental backup will have the final WAL files which will be copied
> as-is
> >>> in the combined full-backup and they will get apply automatically if
> that
> >>> the data directory is used to start the server.
> >>
> >>
> >> Ok, so you keep all the WAL files since the first backup, right?
> >
> >
> > The WAL files will anyway be copied while taking a backup (full or
> incremental),
> > but only last incremental backup's WAL files are copied to the combined
> > synthetic full backup.
> >
> >>>
> 
>  --
>  Ibrar Ahmed
> >>>
> >>>
> >>> --
> >>> Jeevan Chalke
> >>> Technical Architect, Product Development
> >>> EnterpriseDB Corporation
> >>>
> >>
> >>
> >> --
> >> Ibrar Ahmed
> >
> >
> >
> > --
> > Jeevan Chalke
> > Technical Architect, Product Development
> > EnterpriseDB Corporation
> >
>
>
> --
> Regards,
> vignesh
> EnterpriseDB: http://www.enterprisedb.com
>
>
>

Re: block-level incremental backup

2019-07-20 Thread vignesh C

Hi Jeevan,

The idea is very nice.
When Insert/update/delete and truncate/drop happens at various
combinations, How the incremental backup handles the copying of the
blocks?


On Wed, Jul 17, 2019 at 8:12 PM Jeevan Chalke
 wrote:
>
>
>
> On Wed, Jul 17, 2019 at 7:38 PM Ibrar Ahmed  wrote:
>>
>>
>>
>> On Wed, Jul 17, 2019 at 6:43 PM Jeevan Chalke 
>>  wrote:
>>>
>>> On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed  wrote:


 At what stage you will apply the WAL generated in between the START/STOP 
 backup.
>>>
>>>
>>> In this design, we are not touching any WAL related code. The WAL files will
>>> get copied with each backup either full or incremental. And thus, the last
>>> incremental backup will have the final WAL files which will be copied as-is
>>> in the combined full-backup and they will get apply automatically if that
>>> the data directory is used to start the server.
>>
>>
>> Ok, so you keep all the WAL files since the first backup, right?
>
>
> The WAL files will anyway be copied while taking a backup (full or 
> incremental),
> but only last incremental backup's WAL files are copied to the combined
> synthetic full backup.
>
>>>

 --
 Ibrar Ahmed
>>>
>>>
>>> --
>>> Jeevan Chalke
>>> Technical Architect, Product Development
>>> EnterpriseDB Corporation
>>>
>>
>>
>> --
>> Ibrar Ahmed
>
>
>
> --
> Jeevan Chalke
> Technical Architect, Product Development
> EnterpriseDB Corporation
>


--
Regards,
vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: block-level incremental backup

2019-07-17 Thread Jeevan Chalke

On Wed, Jul 17, 2019 at 7:38 PM Ibrar Ahmed  wrote:

>
>
> On Wed, Jul 17, 2019 at 6:43 PM Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote:
>
>> On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed 
>> wrote:
>>
>>>
>>> At what stage you will apply the WAL generated in between the START/STOP
>>> backup.
>>>
>>
>> In this design, we are not touching any WAL related code. The WAL files
>> will
>> get copied with each backup either full or incremental. And thus, the last
>> incremental backup will have the final WAL files which will be copied
>> as-is
>> in the combined full-backup and they will get apply automatically if that
>> the data directory is used to start the server.
>>
>
> Ok, so you keep all the WAL files since the first backup, right?
>

The WAL files will anyway be copied while taking a backup (full or
incremental),
but only last incremental backup's WAL files are copied to the combined
synthetic full backup.


>>
>>> --
>>> Ibrar Ahmed
>>>
>>
>> --
>> Jeevan Chalke
>> Technical Architect, Product Development
>> EnterpriseDB Corporation
>>
>>
>
> --
> Ibrar Ahmed
>


-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation

Re: block-level incremental backup

2019-07-17 Thread Ibrar Ahmed

On Wed, Jul 17, 2019 at 6:43 PM Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

> On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed  wrote:
>
>>
>> At what stage you will apply the WAL generated in between the START/STOP
>> backup.
>>
>
> In this design, we are not touching any WAL related code. The WAL files
> will
> get copied with each backup either full or incremental. And thus, the last
> incremental backup will have the final WAL files which will be copied as-is
> in the combined full-backup and they will get apply automatically if that
> the data directory is used to start the server.
>

Ok, so you keep all the WAL files since the first backup, right?

>
>
>> --
>> Ibrar Ahmed
>>
>
> --
> Jeevan Chalke
> Technical Architect, Product Development
> EnterpriseDB Corporation
>
>

-- 
Ibrar Ahmed

Re: block-level incremental backup

2019-07-17 Thread Jeevan Chalke

On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed  wrote:

>
> At what stage you will apply the WAL generated in between the START/STOP
> backup.
>

In this design, we are not touching any WAL related code. The WAL files will
get copied with each backup either full or incremental. And thus, the last
incremental backup will have the final WAL files which will be copied as-is
in the combined full-backup and they will get apply automatically if that
the data directory is used to start the server.

> --
> Ibrar Ahmed
>

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation

Re: block-level incremental backup

2019-07-17 Thread Ibrar Ahmed

On Wed, Jul 17, 2019 at 10:22 AM Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
>
> On Thu, Jul 11, 2019 at 5:00 PM Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote:
>
>> Hi Anastasia,
>>
>> On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova <
>> a.lubennik...@postgrespro.ru> wrote:
>>
>>> 23.04.2019 14:08, Anastasia Lubennikova wrote:
>>> > I'm volunteering to write a draft patch or, more likely, set of
>>> > patches, which
>>> > will allow us to discuss the subject in more detail.
>>> > And to do that I wish we agree on the API and data format (at least
>>> > broadly).
>>> > Looking forward to hearing your thoughts.
>>>
>>> Though the previous discussion stalled,
>>> I still hope that we could agree on basic points such as a map file
>>> format and protocol extension,
>>> which is necessary to start implementing the feature.
>>>
>>
>> It's great that you too come up with the PoC patch. I didn't look at your
>> changes in much details but we at EnterpriseDB too working on this feature
>> and started implementing it.
>>
>> Attached series of patches I had so far... (which needed further
>> optimization and adjustments though)
>>
>> Here is the overall design (as proposed by Robert) we are trying to
>> implement:
>>
>> 1. Extend the BASE_BACKUP command that can be used with replication
>> connections. Add a new [ LSN 'lsn' ] option.
>>
>> 2. Extend pg_basebackup with a new --lsn=LSN option that causes it to
>> send the option added to the server in #1.
>>
>> Here are the implementation details when we have a valid LSN
>>
>> sendFile() in basebackup.c is the function which mostly does the thing
>> for us. If the filename looks like a relation file, then we'll need to
>> consider sending only a partial file. The way to do that is probably:
>>
>> A. Read the whole file into memory.
>>
>> B. Check the LSN of each block. Build a bitmap indicating which blocks
>> have an LSN greater than or equal to the threshold LSN.
>>
>> C. If more than 90% of the bits in the bitmap are set, send the whole
>> file just as if this were a full backup. This 90% is a constant now; we
>> might make it a GUC later.
>>
>> D. Otherwise, send a file with .partial added to the name. The .partial
>> file contains an indication of which blocks were changed at the beginning,
>> followed by the data blocks. It also includes a checksum/CRC.
>> Currently, a .partial file format looks like:
>>  - start with a 4-byte magic number
>>  - then store a 4-byte CRC covering the header
>>  - then a 4-byte count of the number of blocks included in the file
>>  - then the block numbers, each as a 4-byte quantity
>>  - then the data blocks
>>
>>
>> We are also working on combining these incremental back-ups with the full
>> backup and for that, we are planning to add a new utility called
>> pg_combinebackup. Will post the details on that later once we have on the
>> same page for taking backup.
>>
>
> For combining a full backup with one or more incremental backup, we are
> adding
> a new utility called pg_combinebackup in src/bin.
>
> Here is the overall design as proposed by Robert.
>
> pg_combinebackup starts from the LAST backup specified and work backward.
> It
> must NOT start with the full backup and work forward. This is important
> both
> for reasons of efficiency and of correctness. For example, if you start by
> copying over the full backup and then later apply the incremental backups
> on
> top of it then you'll copy data and later end up overwriting it or removing
> it. Any files that are leftover at the end that aren't in the final
> incremental backup even as .partial files need to be removed, or the
> result is
> wrong. We should aim for a system where every block in the output
> directory is
> written exactly once and nothing ever has to be created and then removed.
>
> To make that work, we should start by examining the final incremental
> backup.
> We should proceed with one file at a time. For each file:
>
> 1. If the complete file is present in the incremental backup, then just
> copy it
> to the output directory - and move on to the next file.
>
> 2. Otherwise, we have a .partial file. Work backward through the backup
> chain
> until we find a complete version of the file. That might happen when we get
> \back to the full backup at the start of the chain, but it might also
> happen
> sooner - at which point we do not need to and should not look at earlier
> backups for that file. During this phase, we should read only the HEADER of
> each .partial file, building a map of which blocks we're ultimately going
> to
> need to read from each backup. We can also compute the offset within each
> file
> where that block is stored at this stage, again using the header
> information.
>
> 3. Now, we can write the output file - reading each block in turn from the
> correct backup and writing it to the write output file, using the map we
> constructed in the previous step. We should probably keep all of the input
> files ope

Re: block-level incremental backup

2019-07-16 Thread Jeevan Chalke

On Thu, Jul 11, 2019 at 5:00 PM Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

> Hi Anastasia,
>
> On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova <
> a.lubennik...@postgrespro.ru> wrote:
>
>> 23.04.2019 14:08, Anastasia Lubennikova wrote:
>> > I'm volunteering to write a draft patch or, more likely, set of
>> > patches, which
>> > will allow us to discuss the subject in more detail.
>> > And to do that I wish we agree on the API and data format (at least
>> > broadly).
>> > Looking forward to hearing your thoughts.
>>
>> Though the previous discussion stalled,
>> I still hope that we could agree on basic points such as a map file
>> format and protocol extension,
>> which is necessary to start implementing the feature.
>>
>
> It's great that you too come up with the PoC patch. I didn't look at your
> changes in much details but we at EnterpriseDB too working on this feature
> and started implementing it.
>
> Attached series of patches I had so far... (which needed further
> optimization and adjustments though)
>
> Here is the overall design (as proposed by Robert) we are trying to
> implement:
>
> 1. Extend the BASE_BACKUP command that can be used with replication
> connections. Add a new [ LSN 'lsn' ] option.
>
> 2. Extend pg_basebackup with a new --lsn=LSN option that causes it to send
> the option added to the server in #1.
>
> Here are the implementation details when we have a valid LSN
>
> sendFile() in basebackup.c is the function which mostly does the thing for
> us. If the filename looks like a relation file, then we'll need to consider
> sending only a partial file. The way to do that is probably:
>
> A. Read the whole file into memory.
>
> B. Check the LSN of each block. Build a bitmap indicating which blocks
> have an LSN greater than or equal to the threshold LSN.
>
> C. If more than 90% of the bits in the bitmap are set, send the whole file
> just as if this were a full backup. This 90% is a constant now; we might
> make it a GUC later.
>
> D. Otherwise, send a file with .partial added to the name. The .partial
> file contains an indication of which blocks were changed at the beginning,
> followed by the data blocks. It also includes a checksum/CRC.
> Currently, a .partial file format looks like:
>  - start with a 4-byte magic number
>  - then store a 4-byte CRC covering the header
>  - then a 4-byte count of the number of blocks included in the file
>  - then the block numbers, each as a 4-byte quantity
>  - then the data blocks
>
>
> We are also working on combining these incremental back-ups with the full
> backup and for that, we are planning to add a new utility called
> pg_combinebackup. Will post the details on that later once we have on the
> same page for taking backup.
>

For combining a full backup with one or more incremental backup, we are
adding
a new utility called pg_combinebackup in src/bin.

Here is the overall design as proposed by Robert.

pg_combinebackup starts from the LAST backup specified and work backward. It
must NOT start with the full backup and work forward. This is important both
for reasons of efficiency and of correctness. For example, if you start by
copying over the full backup and then later apply the incremental backups on
top of it then you'll copy data and later end up overwriting it or removing
it. Any files that are leftover at the end that aren't in the final
incremental backup even as .partial files need to be removed, or the result
is
wrong. We should aim for a system where every block in the output directory
is
written exactly once and nothing ever has to be created and then removed.

To make that work, we should start by examining the final incremental
backup.
We should proceed with one file at a time. For each file:

1. If the complete file is present in the incremental backup, then just
copy it
to the output directory - and move on to the next file.

2. Otherwise, we have a .partial file. Work backward through the backup
chain
until we find a complete version of the file. That might happen when we get
\back to the full backup at the start of the chain, but it might also happen
sooner - at which point we do not need to and should not look at earlier
backups for that file. During this phase, we should read only the HEADER of
each .partial file, building a map of which blocks we're ultimately going to
need to read from each backup. We can also compute the offset within each
file
where that block is stored at this stage, again using the header
information.

3. Now, we can write the output file - reading each block in turn from the
correct backup and writing it to the write output file, using the map we
constructed in the previous step. We should probably keep all of the input
files open over steps 2 and 3 and then close them at the end because
repeatedly closing and opening them is going to be expensive. When that's
done,
go on to the next file and start over at step 1.

We are already started working on this design.

-- 
Jeevan Cha

Re: block-level incremental backup

2019-07-11 Thread Jeevan Chalke

Hi Anastasia,

On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

> 23.04.2019 14:08, Anastasia Lubennikova wrote:
> > I'm volunteering to write a draft patch or, more likely, set of
> > patches, which
> > will allow us to discuss the subject in more detail.
> > And to do that I wish we agree on the API and data format (at least
> > broadly).
> > Looking forward to hearing your thoughts.
>
> Though the previous discussion stalled,
> I still hope that we could agree on basic points such as a map file
> format and protocol extension,
> which is necessary to start implementing the feature.
>

It's great that you too come up with the PoC patch. I didn't look at your
changes in much details but we at EnterpriseDB too working on this feature
and started implementing it.

Attached series of patches I had so far... (which needed further
optimization and adjustments though)

Here is the overall design (as proposed by Robert) we are trying to
implement:

1. Extend the BASE_BACKUP command that can be used with replication
connections. Add a new [ LSN 'lsn' ] option.

2. Extend pg_basebackup with a new --lsn=LSN option that causes it to send
the option added to the server in #1.

Here are the implementation details when we have a valid LSN

sendFile() in basebackup.c is the function which mostly does the thing for
us. If the filename looks like a relation file, then we'll need to consider
sending only a partial file. The way to do that is probably:

A. Read the whole file into memory.

B. Check the LSN of each block. Build a bitmap indicating which blocks have
an LSN greater than or equal to the threshold LSN.

C. If more than 90% of the bits in the bitmap are set, send the whole file
just as if this were a full backup. This 90% is a constant now; we might
make it a GUC later.

D. Otherwise, send a file with .partial added to the name. The .partial
file contains an indication of which blocks were changed at the beginning,
followed by the data blocks. It also includes a checksum/CRC.
Currently, a .partial file format looks like:
 - start with a 4-byte magic number
 - then store a 4-byte CRC covering the header
 - then a 4-byte count of the number of blocks included in the file
 - then the block numbers, each as a 4-byte quantity
 - then the data blocks

We are also working on combining these incremental back-ups with the full
backup and for that, we are planning to add a new utility called
pg_combinebackup. Will post the details on that later once we have on the
same page for taking backup.

Thanks
-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation

IncrBackup.tar.gz
Description: application/gzip

Re: block-level incremental backup

2019-07-10 Thread Anastasia Lubennikova


23.04.2019 14:08, Anastasia Lubennikova wrote:
I'm volunteering to write a draft patch or, more likely, set of 
patches, which

will allow us to discuss the subject in more detail.
And to do that I wish we agree on the API and data format (at least 
broadly).
Looking forward to hearing your thoughts. 


Though the previous discussion stalled,
I still hope that we could agree on basic points such as a map file 
format and protocol extension,

which is necessary to start implementing the feature.

- Proof Of Concept patch -

In attachments, you can find a prototype of incremental pg_basebackup, 
which consists of 2 features:


1) To perform incremental backup one should call pg_basebackup with a 
new argument:


pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn'

where lsn is a start_lsn of parent backup (can be found in 
"backup_label" file)


It calls BASE_BACKUP replication command with a new argument 
PREV_BACKUP_START_LSN 'lsn'.


For datafiles, only pages with LSN > prev_backup_start_lsn will be 
included in the backup.
They are saved into 'filename.partial' file, 'filename.blockmap' file 
contains an array of BlockNumbers.
For example, if we backuped blocks 1,3,5, filename.partial will contain 
3 blocks, and 'filename.blockmap' will contain array {1,3,5}.


Non-datafiles use the same format as before.

2) To merge incremental backup into a full backup call

pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir' 
--merge-backups


It will move all files from 'incremental_basedir' to 'basedir' handling 
'.partial' files correctly.



- Questions to discuss -

Please note that it is just a proof-of-concept patch and it can be 
optimized in many ways.

Let's concentrate on issues that affect the protocol or data format.

1) Whether we collect block maps using simple "read everything page by 
page" approach
or WAL scanning or any other page tracking algorithm, we must choose a 
map format.

I implemented the simplest one, while there are more ideas:

- We can have a map not per file, but per relation or maybe per tablespace,
which will make implementation more complex, but probably more optimal.
The only problem I see with existing implementation is that even if only 
a few blocks changed,

we still must pad it to 512 bytes per tar format requirements.

- We can save LSNs into the block map.

typedef struct BlockMapItem {
    BlockNumber blkno;
    XLogRecPtr lsn;
} BlockMapItem;

In my implementation, invalid prev_backup_start_lsn means fallback to 
regular basebackup
without any block maps. Alternatively, we can define another meaning of 
this value and send a block map for all files.

Backup utilities can use these maps to speed up backup merge or restore.

2) We can implement BASE_BACKUP SEND_FILELIST replication command,
which will return a list of filenames with file sizes and block maps if 
lsn was provided.


To avoid changing format, we can simply send tar headers for each file:
- tarHeader("filename.blockmap") followed by blockmap for relation files 
if prev_backup_start_lsn is provided;
- tarHeader("filename") without actual file content for non relation 
files or for all files in "FULL" backup


The caller can parse messages and use them for any purpose, for example, 
to perform a parallel backup.


Thoughts?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 13e0d23..e757bba 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10459,7 +10459,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 			ti->oid = pstrdup(de->d_name);
 			ti->path = pstrdup(buflinkpath.data);
 			ti->rpath = relpath ? pstrdup(relpath) : NULL;
-			ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+			ti->size = infotbssize ? sendTablespace(fullpath, true, InvalidXLogRecPtr) : -1;
 
 			if (tablespaces)
 *tablespaces = lappend(*tablespaces, ti);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c2978a9..3560da1 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -41,6 +41,7 @@
 #include "utils/ps_status.h"
 #include "utils/relcache.h"
 #include "utils/timestamp.h"
+#include "utils/pg_lsn.h"
 
 
 typedef struct
@@ -52,13 +53,22 @@ typedef struct
 	bool		includewal;
 	uint32		maxrate;
 	bool		sendtblspcmapfile;
+	XLogRecPtr	prev_backup_start_lsn;
 } basebackup_options;
 
 
 static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
-	 List *tablespaces, bool sendtblspclinks);
+	 List *tablespaces, bool sendtblspclinks, XLogRecPtr prev_backup_start_lsn);
 static bool sendFile(const char *readfilename, const char *tarfilename,
-	 struct stat *statbuf, bool missing_ok, Oid dboid);
+	 struct stat *statbuf, bool missing_ok, Oid dboid

Re: block-level incremental backup

2019-04-25 Thread Robert Haas

On Wed, Apr 24, 2019 at 12:57 PM Stephen Frost  wrote:
> So, I had a thought about that when I was composing the last email and
> while I'm still unsure about it, maybe it'd be useful to mention it
> here- do we really need a list of every *file*, or could we reduce that
> down to a list of relations + forks for the main data directory, and
> then always include whatever other directories/files are appropriate?

I'm not quite sure what the difference is here.  I agree that we could
try to compact the list of file names by saying 16384 (24 segments)
instead of 16384, 16384.1, ..., 16384.23, but I doubt that saves
anything meaningful.  I don't see how we can leave anything out
altogether.  If there's a filename called boaty.mcboatface in the
server directory, I think we've got to back it up, and that won't
happen unless the client knows that it is there, and it won't know
unless we include it in a list.

> When it comes to operating in chunks, well, if we're getting a list of
> relations instead of files, we do have this thing called cursors..

Sure... but they don't work for replication commands and I am
definitely not volunteering to change that.

> I would think the client would be able to just ask for the list of
> modified files, when it comes to building up the list of files to ask
> for, which could potentially be done based on mtime instead of by WAL
> scanning or by scanning the files themselves.  Don't get me wrong, I'd
> prefer that we work based on the WAL, since I have more confidence in
> that, but certainly quite a few of the tools do work off mtime these
> days and while it's not perfect, the risk/reward there is pretty
> palatable to a lot of people.

That approach, as with a few others that have been suggested, requires
that the client have access to the previous backup, which makes me
uninterested in implementing it.  I want a version of incremental
backup where the client needs to know the LSN of the previous backup
and nothing else.  That way, if you store your actual backups on a
tape drive in an airless vault at the bottom of the Pacific Ocean, you
can still take incremental backup against them, as long as you
remember to note the LSNs before you ship the backups to the vault.
Woohoo!  It also allows for the wire protocol to be very simple and
the client to be very simple; neither of those things is essential,
but both are nice.

Also, I think using mtimes is just asking to get burned.  Yeah, almost
nobody will, but an LSN-based approach is more granular (block level)
and more reliable (can't be fooled by resetting a clock backward, or
by a filesystem being careless with file metadata), so I think it
makes sense to focus on getting that to work.  It's worth keeping in
mind that there may be somewhat different expectations for an external
tool vs. a core feature.  Stupid as it may sound, I think people using
an external tool are more likely to do things read the directions, and
those directions can say things like "use a reasonable filesystem and
don't set your clock backward."  When stuff goes into core, people
assume that they should be able to run it on any filesystem on any
hardware where they can get it to work and it should just work.  And
you also get a lot more users, so even if the percentage of people not
reading the directions were to stay constant, the actual number of
such people will go up a lot. So picking what we seem to both agree to
be the most robust way of detecting changes seems like the way to go
from here.

> I suspect some of that's driven by how they get solved and if we decide
> we have to solve all of them.  With things like MAX_RATE + incremental
> backups, I wonder how that's going to end up working, when you have the
> option to apply the limit to the network, or to the disk I/O.  You might
> have addressed that elsewhere, I've not looked, and I'm not too
> particular about it personally either, but a definition could be "max
> rate at which we'll read the file you asked for on this connection" and
> that would be pretty straight-forward, I'd think.

I mean, it's just so people can tell pg_basebackup what rate they want
via a command-line option and have it happen like that.  They don't
care about the rates for individual files.

> > Issue #1: If you manually add files to your backup, remove files from
> > your backup, or change files in your backup, bad things will happen.
> > There is fundamentally nothing we can do to prevent this completely,
> > but it may be possible to make the system more resilient against
> > ham-handed modifications, at least to the extent of detecting them.
> > That's maybe a topic for another thread, but it's an interesting one:
> > Andres and I were brainstorming about it at some point.
>
> I'd certainly be interested in hearing about ways we can improve on
> that.  I'm alright with it being on another thread as it's a broader
> concern than just what we're talking about here.

Might be a good topic to chat about at PGCon.

> >

Re: block-level incremental backup

2019-04-24 Thread Stephen Frost

e thing once you
> add incrementals to the picture you might easily remove a full backup
> upon which a newer incremental depends.  I see the need for good tools
> to manage this kind of complexity, but have no plan as part of this
> project to provide them.  I think that just requires too many
> assumptions about where those backups are being stored and how they
> are being catalogued and managed; I don't believe I currently am
> knowledgeable enough to design something that would be good enough to
> meet core standards for inclusion, and I don't want to waste energy
> trying.  If someone else wants to try, that's OK with me, but I think
> it's probably better to let this be a thing that people experiment
> with outside of core for a while until we see what ends up being a
> winner.  I realize that this is a debatable position, but as I'm sure
> you realize by now, I have a strong desire to limit the scope of this
> project in such a way that I can get it done, 'cuz a bird in the hand
> is worth two in the bush.

Even if what we're talking about here is really only "differentials", or
backups where the incremental contains all the changes from a prior full
backup, if the only check is "full LSN is greater than or equal to the
incremental backup LSN", then you have a potential problem that's larger
than just the incrementals no longer being valid because you removed the
full backup on which they were taken- you might think that an *earlier*
full backup is the one for a given incremental and perform a restore
with the wrong full/incremental matchup and end up with a corrupted
database.

These are exactly the kind of issues that make me really wonder if this
is the right natural progression for pg_basebackup or any backup tool to
go in.  Maybe there's some additional things we can do to make it harder
for someone to end up with a corrupted database when they restore, but
it's really hard to get things like expiration correct.  We see users
already ending up with problems because they don't manage expiration of
their WAL correctly, and now we're adding another level of serious
complication to the expiration requirements that, as we've seen even on
this thread, some users are just not going to ever feel comfortable
with doing on their own.

Perhaps it's not relevant and I get that you want to build this cool
incremental backup capability into pg_basebackup and I'm not going to
stop you from doing it, but if I was going to build a backup tool,
adding support for block-level incremental backup wouldn't be where I'd
start, and, in fact, I might not even get to it even after investing
over 5 years in the project and even after building in proper backup
management.  The idea of implementing block-level incrementals while
pushing the backup management, expiration, and dependency between
incrementals and fulls on to the user to figure out just strikes me as
entirely backwards and, frankly, to be gratuitously 'itch scratching' at
the expense of what users really want and need here.

One of the great things about pg_basebackup is its simplicity and
ability to be a one-time "give me a snapshot of the database" and this
is building in a complicated feature to it that *requires* users to
build their own basic capabilities externally in order to be able to use
it.  I've tried to avoid getting into that here and I won't go on about
it, since it's your time to do with as you feel appropriate, but I do
worry that it makes us, as a project, look a bit more cavalier about
what users are asking for vs. what cool new thing we want to play with
than I, at least, would like us to be (so, I'll caveat that with "in
this area anyway", since I suspect saying this will probably come back
to bite me in some other discussion later ;).

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: block-level incremental backup

2019-04-24 Thread Robert Haas

On Wed, Apr 24, 2019 at 9:28 AM Stephen Frost  wrote:
> Looking at it from what I'm sitting, I brought up two ways that we
> could extend the protocol to "request from the server only those blocks
> where LSN >= threshold_value" with one being the modification to
> BASE_BACKUP and the other being a new set of commands that could be
> parallelized.  If I had assumed that you'd be thinking the same way I am
> about extending the backup protocol, I wouldn't have said anything now
> and then would have complained after you wrote a patch that just
> extended the BASE_BACKUP command, at which point I likely would have
> been told that it's now been done and that I should have mentioned it
> earlier.

Fair enough.

> At least in part then it seems like we're viewing the level of effort
> around what I'm talking about quite differently, and I feel like that's
> largely because every time I mention parallel anything there's this
> assumption that I'm asking you to parallelize pg_basebackup or write a
> whole bunch more code to provide a fully optimized server-side parallel
> implementation for backups.  That really wasn't what I was going for.  I
> was thinking it would be a modest amount of additional work add
> incremental backup via a few new commands, instead of through the
> BASE_BACKUP protocol command, that would make parallelization possible.

I'm not sure about that.  It doesn't seem crazy difficult, but there
are a few wrinkles.  One is that if the client is requesting files one
at a time, it's got to have a list of all the files that it needs to
request, and that means that it has to ask the server to make a
preparatory pass over the whole PGDATA directory to get a list of all
the files that exist.  That overhead is not otherwise needed.  Another
is that the list of files might be really large, and that means that
the client would either use a lot of memory to hold that great big
list, or need to deal with spilling the list to a spool file
someplace, or else have a server protocol that lets the list be
fetched in incrementally in chunks.  A third is that, as you mention
further on, it means that the client has to care a lot more about
exactly how the server is figuring out which blocks have been
modified.  If it just says BASE_BACKUP ..., the server an be
internally reading each block and checking the LSN, or using
WAL-scanning or ptrack or whatever and the client doesn't need to know
or care.  But if the client is asking for a list of modified files or
blocks, then that presumes the information is available, and not too
expensively, without actually reading the files.  Fourth, MAX_RATE
probably won't actually limit to the correct rate overall if the limit
is applied separately to each file.

I'd be afraid that a patch that tried to handle all that as part of
this project would get rejected on the grounds that it was trying to
solve too many unrelated problems.  Also, though not everybody has to
agree on what constitutes a "modest amount of additional work," I
would not describe solving all of those problems as a modest effort,
but rather a pretty substantial one.

> There's a tangent on all of this that's pretty key though, which is the
> question around just how the blocks are identified.  If the WAL scanning
> is done to figure out the blocks, then that's quite a bit different from
> the other idea of "open this relation and scan it, but only give me the
> blocks after this LSN".  It's the latter case that I've been mostly
> thinking about in this thread, which is part of why I was thinking it'd
> be a modest amount of work to have protocol commands that accepted a
> file (or perhaps a relation..) to scan and return blocks from instead of
> baking this into BASE_BACKUP which by definition just serially scans the
> data directory and returns things as it finds them.  For the case where
> we have WAL scanning happening and modfiles which are being read and
> used to figure out the blocks to send, it seems like it might be more
> complicated and therefore potentially quite a bit more work to have a
> parallel version of that.

Yeah.  I don't entirely agree that the first one is simple, as per the
above, but I definitely agree that the second one is more complicated
than the first one.

> > Well, one thing you might want to do is have a tool that connects to
> > the server, enters backup mode, requests information on what blocks
> > have changed, copies those blocks via direct filesystem access, and
> > then exits backup mode.  Such a tool would really benefit from a
> > START_BACKUP / SEND_FILE_LIST / SEND_FILE_CONTENTS / STOP_BACKUP
> > command language, because it would just skip ever issuing the
> > SEND_FILE_CONTENTS command in favor of doing that part of the work via
> > other means.  On the other hand, a START_PARALLEL_BACKUP LSN '1/234'
> > command is useless to such a tool.
>
> That's true, but I hardly ever hear people talking about how wonderful
> it is that pgBackRest uses SSH to grab the data

Re: block-level incremental backup

2019-04-24 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Apr 22, 2019 at 2:26 PM Stephen Frost  wrote:
> > There was basically zero discussion about what things would look like at
> > a protocol level (I went back and skimmed over the thread before sending
> > my last email to specifically see if I was going to get this response
> > back..).  I get the idea behind the diff file, the contents of which I
> > wasn't getting into above.
> 
> Well, I wrote:
> 
> "There should be a way to tell pg_basebackup to request from the
> server only those blocks where LSN >= threshold_value."
> 
> I guess I assumed that people would interested in the details take
> that to mean "and therefore the protocol would grow an option for this
> type of request in whatever way is the most straightforward possible
> extension of the current functionality is," which is indeed how you
> eventually interpreted it when you said we could "extend BASE_BACKUP
> is by adding LSN as an optional parameter."

Looking at it from what I'm sitting, I brought up two ways that we
could extend the protocol to "request from the server only those blocks
where LSN >= threshold_value" with one being the modification to
BASE_BACKUP and the other being a new set of commands that could be
parallelized.  If I had assumed that you'd be thinking the same way I am
about extending the backup protocol, I wouldn't have said anything now
and then would have complained after you wrote a patch that just
extended the BASE_BACKUP command, at which point I likely would have
been told that it's now been done and that I should have mentioned it
earlier.

> > external tools to leverage that.  It sounds like what you're suggesting
> > now is that you're happy to implement the backend code, expose it in a
> > way that works just for pg_basebackup, and that if someone else wants to
> > add things to the protocol to make it easier for external tools to
> > leverage, great.
> 
> Yep, that's more or less it, although I am potentially willing to do
> some modest amount of that other work along the way.  I just don't
> want to prioritize it higher than getting the actual thing I want to
> build built, which I think is a pretty fair position for me to take.

At least in part then it seems like we're viewing the level of effort
around what I'm talking about quite differently, and I feel like that's
largely because every time I mention parallel anything there's this
assumption that I'm asking you to parallelize pg_basebackup or write a
whole bunch more code to provide a fully optimized server-side parallel
implementation for backups.  That really wasn't what I was going for.  I
was thinking it would be a modest amount of additional work add
incremental backup via a few new commands, instead of through the
BASE_BACKUP protocol command, that would make parallelization possible.

Now, through this discussion, you've brought up some really good points
about how the initial thoughts I had around how we could add some
relatively simple commands, as part of this work, to make it easier for
someone to later add parallel support to pg_basebackup (either full or
incremental), or for external tools to leverage, might not be the best
solution when it comes to having parallel backup in core, and therefore
wouldn't actually end up being useful towards that end.  That's
certainly a fair point and possibly enough to justify not spending even
the modest time I was thinking it'd need, but I'm not convinced.  Now,
that said, if you are convinced that's the case, and you're doing the
work, then it's certainly your prerogative to go in the direction you're
convinced of.  I don't mean any of this discussion to imply that I'd
object to a commit that extended BASE_BACKUP in the way outlined above,
but I understood the question to be "what do people think of this idea?"
and to that I'm still of the opinion that spending a modest amount of
time to provide a way to parallelize an incremental backup is worth it,
even if it isn't optimal and isn't the direct goal of this effort.

There's a tangent on all of this that's pretty key though, which is the
question around just how the blocks are identified.  If the WAL scanning
is done to figure out the blocks, then that's quite a bit different from
the other idea of "open this relation and scan it, but only give me the
blocks after this LSN".  It's the latter case that I've been mostly
thinking about in this thread, which is part of why I was thinking it'd
be a modest amount of work to have protocol commands that accepted a
file (or perhaps a relation..) to scan and return blocks from instead of
baking this into BASE_BACKUP which by definition just serially scans the
data directory and returns things as it finds them.  For the case where
we have WAL scanning happening and modfiles which are being read and
used to figure out the blocks to send, it seems like it might be more
complicated and therefore potentially quite a bit more work to have a
parallel ve

Re: block-level incremental backup

2019-04-23 Thread Adam Brusselback

I hope it's alright to throw in my $0.02 as a user. I've been following
this (and the other thread on reading WAL to find modified blocks,
prefaulting, whatever else) since the start with great excitement and would
love to see the built-in backup capabilities in Postgres greatly improved.
I know this is not completely on-topic for just incremental backups, so I
apologize in advance. It just seemed like the most apt place to chime in.


Just to preface where I am coming from, I have been using pgBackRest for
the past couple years and used wal-e prior to that.  I am not a big *nix
user other than all my servers, do all my development on Windows / use
primarily Java. The command line is not where I feel most comfortable
despite my best efforts over the last 5-6 years. Prior to Postgres, I used
SQL Server for quite a few years at previous companies but was more a
junior / intermediate skill set back then. I just wanted to put that out
there so you can see where my bias's are.




With all that said, I would not be comfortable using pg_basebackup as my
main backup tool simply because I’d have to cobble together numerous tools
to get backups stored in a safe (not on the same server) location, I’d have
to manage expiring backups and the WAL which is no longer needed, along
with the rest of the stuff that makes these backup management tools useful.


The command line scares me, and even if I was able to get all that working,
I would not feel warm and fuzzy I didn’t mess something up horribly and I
may hit an edge case which destroys backups, silently corrupts data, etc.

I love that there are tools that manage all of it; backups, wal archiving,
remote storage, integrate with cloud storage (S3 and the like), manages the
retention of these backups with all their dependencies for me, and has all
the restore options necessary built in as well.


Block level incremental backup would be amazing for my use case. I have
small updates / deletes that happen to data all over some of my largest
tables. With pgBackRest, since the diff/incremental backups are at the file
level, I can have a single update / delete which touched a random spot in a
table and now requires that whole 1gb file to be backed up again. That
said, even if pg_basebackup was the only tool that did incremental block
level backup tomorrow, I still wouldn’t start using it directly. I went
into the issues I’d have to deal with if I used pg_basebackup above, and
incremental backups without a management tool make me think using it
correctly would be much harder.


I know this thread is just about incremental backup, and that pretty much
everything in core is built up from small features into larger more complex
ones. I understand that and am not trying to dump on any efforts, I am
super excited to see work being done in this area! I just wanted to share
my perspective on how crucial good backup management is to me (and I’m sure
a few others may share my sentiment considering how popular all the
external tools are).

I would never put a system in production unless I have some backup
management in place. If core builds a backup management tool which uses
pg_basebackup as building blocks for its solution…awesome! That may be
something I’d use.  If pg_basebackup can be improved so it can be used as
the basis most external backup management tools can build on top of, that’s
also great. All the external tools which practically every Postgres company
have built show that it’s obviously a need for a lot of users. Core will
never solve every single problem for all users, I know that. It would just
be great to see some of the fundamental features of backup management baked
into core in an extensible way.

With that, there could be a recommended way to set up backups
(full/incremental, parallel, compressed), point in time recovery, backup
retention, and perform restores (to a point in time, on a replica server,
etc) with just the tooling within core with a nice and simple user
interface, and great performance.

If those features core supports in the internal tooling are built in an
extensible way (as has been discussed), there could be much less
duplication of work implementing the same base features over and over for
each external tool. Those companies can focus on more value-added features
to their own products that core would never support, or on improving the
tooling/performance/features core provides.


Well, this is way longer and a lot less coherent than I was hoping, so I
apologize for that. Hopefully my stream of thoughts made a little bit of
sense to someone.


-Adam

Re: block-level incremental backup

2019-04-23 Thread Anastasia Lubennikova


22.04.2019 2:02, Robert Haas wrote:

I think we're getting closer to a meeting of the minds here, but I
don't think it's intrinsically necessary to rewrite the whole method
of operation of pg_basebackup to implement incremental backup in a
sensible way.  One could instead just do a straightforward extension
to the existing BASE_BACKUP command to enable incremental backup.
Then, to enable parallel full backup and all sorts of out-of-core
hacking, one could expand the command language to allow tools to
access individual steps: START_BACKUP, SEND_FILE_LIST,
SEND_FILE_CONTENTS, STOP_BACKUP, or whatever.  The second thing makes
for an appealing project, but I do not think there is a technical
reason why it has to be done first.  Or for that matter why it has to
be done second.  As I keep saying, incremental backup and full backup
are separate projects and I believe it's completely reasonable for
whoever is doing the work to decide on the order in which they would
like to do the work.

Having said that, I'm curious what people other than Stephen (and
other pgbackrest hackers) think about the relative value of parallel
backup vs. incremental backup.  Stephen appears quite convinced that
parallel backup is full of win and incremental backup is a bit of a
yawn by comparison, and while I certainly would not want to discount
the value of his experience in this area, it sometimes happens on this
mailing list that [ drum roll please ] not everybody agrees about
everything.  So, what do other people think?


Personally, I believe that incremental backups are more useful to implement
first since they benefit both backup speed and the space taken by a backup.
Frankly speaking, I'm a bit surprised that the discussion of parallel 
backups

took so much of this thread.
Of course, we must keep it in mind, while designing the API to avoid 
introducing

any architectural obstacles, but any further discussion of parallelism is a
subject of another topic.


I understand Stephen's concerns about the difficulties of incremental backup
management.
Even with an assumption that user is ready to manage backup chains, 
retention,
and other stuff, we must consider the format of backup metadata that 
will allow

us to perform some primitive commands:

1) Tell whether this backup full or incremental.

2) Tell what backup is a parent of this incremental backup.
Probably, we can limit it to just returning "start_lsn", which later can be
compared to "stop_lsn" of parent backup.

3) Take an incremental backup based on this backup.
Here we must help a backup manager to retrieve the LSN to pass it to
pg_basebackup.

4) Restore an incremental backup into a directory (on top of already 
restored

full backup).
One may use it to perform "merge" or "restore" of the incremental backup,
depending on the destination directory.
I wonder if it is possible to integrate it into any existing tool, or we 
end up

with something like pg_basebackup/pg_baserestore as in case of
pg_dump/pg_restore.

Have you designed these? I may only recall "pg_combinebackup" from the very
first message in this thread, which looks more like a sketch to explain the
idea, rather than the thought-out feature design. I also found a page
https://wiki.postgresql.org/wiki/Incremental_backup that raises the same
questions.
I'm volunteering to write a draft patch or, more likely, set of patches, 
which

will allow us to discuss the subject in more detail.
And to do that I wish we agree on the API and data format (at least 
broadly).

Looking forward to hearing your thoughts.


As I see it, ideally the backup management tools should concentrate more on
managing multiple backups, while all the logic of taking a single backup 
(of any
kind) should be integrated into the core. It means that any out-of-core 
client
won't have to walk the PGDATA directory and care about all the postgres 
specific
knowledge of data files consisting of blocks with headers and LSNs and 
so on. It

simply requests data and gets it.
Understandably, it won't be implemented in one take and what is more 
probably,

it is not reachable fully.
Still, it will be great to do our best to provide such tools (both 
existing and

future) with conveniently formatted data and API to get it.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: block-level incremental backup

2019-04-22 Thread Robert Haas

On Mon, Apr 22, 2019 at 2:26 PM Stephen Frost  wrote:
> There was basically zero discussion about what things would look like at
> a protocol level (I went back and skimmed over the thread before sending
> my last email to specifically see if I was going to get this response
> back..).  I get the idea behind the diff file, the contents of which I
> wasn't getting into above.

Well, I wrote:

"There should be a way to tell pg_basebackup to request from the
server only those blocks where LSN >= threshold_value."

I guess I assumed that people would interested in the details take
that to mean "and therefore the protocol would grow an option for this
type of request in whatever way is the most straightforward possible
extension of the current functionality is," which is indeed how you
eventually interpreted it when you said we could "extend BASE_BACKUP
is by adding LSN as an optional parameter."

I could have been more explicit, but sometimes people tell me that my
emails are too long.

> external tools to leverage that.  It sounds like what you're suggesting
> now is that you're happy to implement the backend code, expose it in a
> way that works just for pg_basebackup, and that if someone else wants to
> add things to the protocol to make it easier for external tools to
> leverage, great.

Yep, that's more or less it, although I am potentially willing to do
some modest amount of that other work along the way.  I just don't
want to prioritize it higher than getting the actual thing I want to
build built, which I think is a pretty fair position for me to take.

> All I can say is that that's basically how we ended up
> in the situation we're in today where pg_basebackup doesn't support
> parallel backup but a bunch of external tools do and they don't go
> through the backend to get there, even though they'd probably prefer to.

I certainly agree that core should try to do things in a way that is
useful to external tools when that can be done without undue effort,
but only if it can actually be done without undo effort.  Let's see
whether that's the case here:

- Anastasia wants a command added that dumps out whatever the server
knows about what files have changed, which I already agreed was a
reasonable extension of my initial proposal.

- You said that for this to be useful to pgbackrest, it'd have to use
a whole different mechanism that includes commands to request
individual files and blocks within those files, which would be a
significant rewrite of pg_basebackup that you agreed is more closely
related to parallel backup than to the project under discussion on
this thread.  And that even then pgbackrest probably wouldn't use it
because it also does server-side compression and encryption which are
not included in this proposal.

It seems to me that the first one falls into the category a reasonable
additional effort and the second one falls into the category of lots
of extra and unrelated work that wouldn't even get used.

> Thanks for sharing your thoughts on that, certainly having the backend
> able to be more intelligent about streaming files to avoid latency is
> good and possibly the best approach.  Another alternative to reducing
> the latency would be to have a way for the client to request a set of
> files, but I don't know that it'd be better.

I don't know either.  This is an area that needs more thought, I
think, although as discussed, it's more related to parallel backup
than $SUBJECT.

> I'm not really sure why the above is extremely inconvenient for
> third-party tools, beyond just that they've already been written to work
> with an assumption that the server-side of things isn't as intelligent
> as PG is.

Well, one thing you might want to do is have a tool that connects to
the server, enters backup mode, requests information on what blocks
have changed, copies those blocks via direct filesystem access, and
then exits backup mode.  Such a tool would really benefit from a
START_BACKUP / SEND_FILE_LIST / SEND_FILE_CONTENTS / STOP_BACKUP
command language, because it would just skip ever issuing the
SEND_FILE_CONTENTS command in favor of doing that part of the work via
other means.  On the other hand, a START_PARALLEL_BACKUP LSN '1/234'
command is useless to such a tool.

Contrariwise, a tool that has its own magic - perhaps based on
WAL-scanning or something like ptrack - to know which files currently
exist and which blocks are modified could use SEND_FILE_CONTENTS but
not SEND_FILE_LIST.  And a filesystem-snapshot based technique might
use START_BACKUP and STOP_BACKUP but nothing else.

In short, providing granular commands like this lets the client be
really intelligent even if the server isn't, and lets the client have
fine-grained control of the process.  This is very good if you're an
out-of-core tool maintainer and your tool is trying to be smarter than
- or even just differently-designed than - core.

But if what you really want is just a maximally-efficient parallel
backup, you don't nee

Re: block-level incremental backup

2019-04-22 Thread Stephen Frost

Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-04-22 14:26:40 -0400, Stephen Frost wrote:
> > I'm disappointed that the concerns about the trouble that end users are
> > likely to have with this didn't garner more discussion.
> 
> My impression is that endusers are having a lot more trouble due to
> important backup/restore features not being in core/pg_basebackup, than
> due to external tools having a harder time to implement certain
> features.

I had been referring specifically to the concern I raised about
incremental block-level backups being added to pg_basebackup and how
that'll make using pg_basebackup more complicated and therefore more
difficult for end-users to get right, particularly if the end user is
having to handle management of the association between the full backup
and the incremental backups.  I wasn't referring to anything regarding
external tools.

> Focusing on external tools being able to provide all those
> features, because core hasn't yet, is imo entirely the wrong thing to
> concentrate upon.  And it's not like things largely haven't been
> implemented in pg_basebackup for fundamental architectural reasons.
> It's because we've built like 5 different external tools with randomly
> differing featureset and licenses.

There's a few challenges when it comes to adding backup features to
core.  One of the reasons is that core naturally moves slower when it
comes to development than external projects do, as was discusssed
earlier on this thread.  Another is that, when it comes to backup,
specifically, people want to back up their *existing* systems, which
means that they need a backup tool that's going to work with whatever
version of PG they've currently got deployed and that's often a few
years old already.  Certainly when I've thought about features that we'd
like to see and considered if there's something that could be
implemented in core vs. implemented outside of core, the answer often
ends up being "well, if we do it ourselves then we can make it work for
PG 9.2 and above, and have it working for existing users, but if we work
it in as part of core, it won't be available until next year and only
for version 12 and above, and users can only use it once they've
upgraded.."

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: block-level incremental backup

2019-04-22 Thread Andres Freund

Hi,

On 2019-04-22 14:26:40 -0400, Stephen Frost wrote:
> I'm disappointed that the concerns about the trouble that end users are
> likely to have with this didn't garner more discussion.

My impression is that endusers are having a lot more trouble due to
important backup/restore features not being in core/pg_basebackup, than
due to external tools having a harder time to implement certain
features. Focusing on external tools being able to provide all those
features, because core hasn't yet, is imo entirely the wrong thing to
concentrate upon.  And it's not like things largely haven't been
implemented in pg_basebackup for fundamental architectural reasons.
It's because we've built like 5 different external tools with randomly
differing featureset and licenses.

Greetings,

Andres Freund

Re: block-level incremental backup

2019-04-22 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Apr 22, 2019 at 1:08 PM Stephen Frost  wrote:
> > > One could instead just do a straightforward extension
> > > to the existing BASE_BACKUP command to enable incremental backup.
> >
> > Ok, how do you envision that?  As I mentioned up-thread, I am concerned
> > that we're talking too high-level here and it's making the discussion
> > more difficult than it would be if we were to put together specific
> > ideas and then discuss them.
> >
> > One way I can imagine to extend BASE_BACKUP is by adding LSN as an
> > optional parameter and then having the database server scan the entire
> > cluster and send a tarball which contains essentially a 'diff' file of
> > some kind for each file where we can construct a diff based on the LSN,
> > and then the complete contents of the file for everything else that
> > needs to be in the backup.
> 
> /me scratches head.  Isn't that pretty much what I described in my
> original post?  I even described what that "'diff' file of some kind"
> would look like in some detail in the paragraph of that emailed
> numbered "2.", and I described the reasons for that choice at length
> in 
> http://postgr.es/m/ca+tgmozrqdv-tb8ny9p+1pqlqkxp5f1afghuohh5qt6ewdk...@mail.gmail.com
> 
> I can't figure out how I'm managing to be so unclear about things
> about which I thought I'd been rather explicit.

There was basically zero discussion about what things would look like at
a protocol level (I went back and skimmed over the thread before sending
my last email to specifically see if I was going to get this response
back..).  I get the idea behind the diff file, the contents of which I
wasn't getting into above.

> > So, sure, that would work, but it wouldn't be able to be parallelized
> > and I don't think it'd end up being very exciting for the external tools
> > because of that, but it would be fine for pg_basebackup.
> 
> Stop being such a pessimist.  Yes, if we only add the option to the
> BASE_BACKUP command, it won't directly be very exciting for external
> tools, but a lot of the work that is needed to do things that ARE
> exciting for external tools will have been done.  For instance, if the
> work to figure out which blocks have been modified via WAL-scanning
> gets done, and initially that's only exposed via BASE_BACKUP, it won't
> be much work for somebody to write code for a new code that exposes
> that information directly through some new replication command.
> There's a difference between something that's going in the wrong
> direction and something that's going in the right direction but not as
> far or as fast as you'd like.  And I'm 99% sure that everything I'm
> proposing here falls in the latter category rather than the former.

I didn't mean to imply that you're doing in the wrong direction here and
I thought I said somewhere in my last email more-or-less exactly the
same, that a great deal of the work needed for block-level incremental
backup would be done, but specifically that this proposal wouldn't allow
external tools to leverage that.  It sounds like what you're suggesting
now is that you're happy to implement the backend code, expose it in a
way that works just for pg_basebackup, and that if someone else wants to
add things to the protocol to make it easier for external tools to
leverage, great.  All I can say is that that's basically how we ended up
in the situation we're in today where pg_basebackup doesn't support
parallel backup but a bunch of external tools do and they don't go
through the backend to get there, even though they'd probably prefer to.

> > On the other hand, if you added new commands for 'list of files changed
> > since this LSN' and 'give me this file' and 'give me this file with the
> > changes in it since this LSN', then pg_basebackup could work with that
> > pretty easily in a single-threaded model (maybe with two connections to
> > the backend, but still in a single process, or maybe just by slurping up
> > the file list and then asking for each one) and the external tools could
> > leverage those new capabilities too for their backups, both full backups
> > and incremental ones.  This also wouldn't have to change how
> > pg_basebackup does full backups today one bit, so what we're really
> > talking about here is the direction to take the new code that's being
> > written, not about rewriting existing code.  I agree that it'd be a bit
> > more work...  but hopefully not *that* much more, and i

Re: block-level incremental backup

2019-04-22 Thread Stephen Frost

Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-04-19 20:04:41 -0400, Stephen Frost wrote:
> > I agree that we don't want another implementation and that there's a lot
> > that we want to do to improve replay performance.  We've already got
> > frontend tools which work with multiple execution threads, so I'm not
> > sure I get the "not easily feasible" bit, and the argument about the
> > checkpointer seems largely related to that (as in- if we didn't have
> > multiple threads/processes then things would perform quite badly...  but
> > we can and do have multiple threads/processes in frontend tools today,
> > even in pg_basebackup).
> 
> You need not just multiple execution threads, but basically a new
> implementation of shared buffers, locking, process monitoring, with most
> of the related infrastructure. You're literally talking about
> reimplementing a very substantial portion of the backend.  I'm not sure
> I can transport in written words - via a public medium - how bad an idea
> it would be to go there.

Yes, there'd be some need for locking and process monitoring, though if
we aren't supporting ongoing read queries at the same time, there's a
whole bunch of things that we don't need from the existing backend.

> > > Which I think is entirely reasonable. With the 'consistent' and LSN
> > > recovery targets one already can get most of what's needed from such a
> > > tool, anyway.  I'd argue the biggest issue there is that there's no
> > > equivalent to starting postgres with a private socket directory on
> > > windows, and perhaps an option or two making it easier to start postgres
> > > in a "private" mode for things like this.
> > 
> > This would mean building in a way to do parallel WAL replay into the
> > server binary though, as discussed above, and it seems like making that
> > work in a way that allows us to still be available as a read-only
> > standby would be quite a bit more difficult.  We could possibly support
> > parallel WAL replay only when we aren't a replica but from the same
> > binary.
> 
> I'm doubtful that we should try to implement parallel WAL apply that
> can't support HS - a substantial portion of the the logic to avoid
> issues around relfilenode reuse, consistency etc is going to be to be
> necessary for non-HS aware apply anyway.  But if somebody had a concrete
> proposal for something that's fundamentally only doable without HS, I
> could be convinced.

I'd certainly prefer that we support parallel WAL replay *with* HS, that
just seems like a much larger problem, but I'd be quite happy to be told
that it wouldn't be that much harder.

> > A lot of this part of the discussion feels like a tangent though, unless
> > I'm missing something.
> 
> I'm replying to:
> 
> On 2019-04-17 18:43:10 -0400, Stephen Frost wrote:
> > Wow.  I have to admit that I feel completely opposite of that- I'd
> > *love* to have an independent tool (which ideally uses the same code
> > through the common library, or similar) that can be run to apply WAL.
> 
> And I'm basically saying that anything that starts from this premise is
> fatally flawed (in the ex falso quodlibet kind of sense ;)).

I'd just say that it'd be... difficult. :)

> > The "WAL compression" tool contemplated
> > previously would be much simpler and not the full-blown WAL replay
> > capability, which would be left to the server, unless you're suggesting
> > that even that should be exclusively the purview of the backend?  Though
> > that ship's already sailed, given that external projects have
> > implemented it.
> 
> I'm extremely doubtful of such tools (but it's not what I was responding
> too, see above). I'd be extremely surprised if even one of them came
> close to being correct. The old FPI removal tool had data corrupting
> bugs left and right.

I have concerns about it myself, which is why I'd actually really like
to see something in core that does it, and does it the right way, that
other projects could then leverage (ideally by just linking into the
library without having to rewrite what's in core, though that might not
be an option for things like WAL-G that are in Go and possibly don't
want to link in some C library).

> > Having a library to provide that which external
> > projects could leverage would be nicer than having everyone write their
> > own version.
> 
> No, I don't think that's necessarily true. Something complicated that's
> hard to get right doesn't have to be provided by core. Even if other
> projects decide that their risk/reward assesment is different than core
> postgres'. We don't have to take on all kind of work and complexity for
> external tools.

No, it doesn't have to be provided by core, but I sure would like it to
be and I'd be much more comfortable if it was because then we'd also
take care to not break whatever assumptions are made (or to do so in a
way that can be detected and/or handled) as new code is written.  As
discussed above, as long as it isn't provided by core

Re: block-level incremental backup

2019-04-22 Thread Robert Haas

BACKUP/SEND_FILE_LIST/SEND_FILE_CONTENTS/STOP_BACKUP type
interface for use by external tools.  On the other hand, maybe the
additional overhead caused by managing the list of files to be fetched
on the client side is negligible.  It'd be interesting to see, though,
how busy the server is when running an incremental backup managed by
an external tool like BART or pgbackrest on a cluster with a gazillion
little-tiny relations.  I wonder if we'd find that it spends most of
its time waiting for the client.

> What I'm afraid will be lackluster is adding block-level incremental
> backup support to pg_basebackup without any support for managing
> backups or anything else.  I'm also concerned that it's going to mean
> that people who want to use incremental backup with pg_basebackup are
> going to have to write a lot of their own management code (probably in
> shell scripts and such...) around that and if they get anything wrong
> there then people are going to end up with bad backups that they can't
> restore from, or they'll have corrupted clusters if they do manage to
> get them restored.

I think that this is another complaint that basically falls into the
category of saying that this proposal might not fix everything for
everybody, but that complaint could be levied against any reasonable
development proposal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-04-22 Thread Andres Freund

Hi,

On 2019-04-19 20:04:41 -0400, Stephen Frost wrote:
> I agree that we don't want another implementation and that there's a lot
> that we want to do to improve replay performance.  We've already got
> frontend tools which work with multiple execution threads, so I'm not
> sure I get the "not easily feasible" bit, and the argument about the
> checkpointer seems largely related to that (as in- if we didn't have
> multiple threads/processes then things would perform quite badly...  but
> we can and do have multiple threads/processes in frontend tools today,
> even in pg_basebackup).

You need not just multiple execution threads, but basically a new
implementation of shared buffers, locking, process monitoring, with most
of the related infrastructure. You're literally talking about
reimplementing a very substantial portion of the backend.  I'm not sure
I can transport in written words - via a public medium - how bad an idea
it would be to go there.

> You certainly bring up some good concerns though and they make me think
> of other bits that would seem like they'd possibly be larger issues for
> a frontend tool- like having a large pool of memory for cacheing (aka
> shared buffers) the changes.  If what we're talking about here is *just*
> replay though, without having the system available for reads, I wonder
> if we might want a different solution there.

No.

> > Which I think is entirely reasonable. With the 'consistent' and LSN
> > recovery targets one already can get most of what's needed from such a
> > tool, anyway.  I'd argue the biggest issue there is that there's no
> > equivalent to starting postgres with a private socket directory on
> > windows, and perhaps an option or two making it easier to start postgres
> > in a "private" mode for things like this.
> 
> This would mean building in a way to do parallel WAL replay into the
> server binary though, as discussed above, and it seems like making that
> work in a way that allows us to still be available as a read-only
> standby would be quite a bit more difficult.  We could possibly support
> parallel WAL replay only when we aren't a replica but from the same
> binary.

I'm doubtful that we should try to implement parallel WAL apply that
can't support HS - a substantial portion of the the logic to avoid
issues around relfilenode reuse, consistency etc is going to be to be
necessary for non-HS aware apply anyway.  But if somebody had a concrete
proposal for something that's fundamentally only doable without HS, I
could be convinced.

> The concerns mentioned about making it easier to start PG in a
> private mode don't seem too bad but I am not entirely sure that the
> tools which want to leverage that kind of capability would want to have
> to exec out to the PG binary to use it.

Tough luck.  But even leaving infeasability aside, it seems like a quite
bad idea to do this in-process inside a tool that manages backup &
recovery. Creating threads / sub-processes with complicated needs (like
any pared down version of pg to do just recovery would have) from within
a library has substantial complications. So you'd not want to do this
in-process anyway.

> A lot of this part of the discussion feels like a tangent though, unless
> I'm missing something.

I'm replying to:

On 2019-04-17 18:43:10 -0400, Stephen Frost wrote:
> Wow.  I have to admit that I feel completely opposite of that- I'd
> *love* to have an independent tool (which ideally uses the same code
> through the common library, or similar) that can be run to apply WAL.

And I'm basically saying that anything that starts from this premise is
fatally flawed (in the ex falso quodlibet kind of sense ;)).

> The "WAL compression" tool contemplated
> previously would be much simpler and not the full-blown WAL replay
> capability, which would be left to the server, unless you're suggesting
> that even that should be exclusively the purview of the backend?  Though
> that ship's already sailed, given that external projects have
> implemented it.

I'm extremely doubtful of such tools (but it's not what I was responding
too, see above). I'd be extremely surprised if even one of them came
close to being correct. The old FPI removal tool had data corrupting
bugs left and right.

> Having a library to provide that which external
> projects could leverage would be nicer than having everyone write their
> own version.

No, I don't think that's necessarily true. Something complicated that's
hard to get right doesn't have to be provided by core. Even if other
projects decide that their risk/reward assesment is different than core
postgres'. We don't have to take on all kind of work and complexity for
external tools.

Greetings,

Andres Freund

Re: block-level incremental backup

2019-04-22 Thread Stephen Frost

es my thoughts here.  I'd
also like to hear what others think, particularly those who have been
working in this area.

> think about the relative value of parallel
> backup vs. incremental backup.  Stephen appears quite convinced that
> parallel backup is full of win and incremental backup is a bit of a
> yawn by comparison, and while I certainly would not want to discount
> the value of his experience in this area, it sometimes happens on this
> mailing list that [ drum roll please ] not everybody agrees about
> everything.  So, what do other people think?

I'm afraid this is painting my position here with an extremely broad
brush and so I'd like to clarify a bit: I'm *all* for incremental
backups.  Incremental and differential backups were supported by
pgBackRest very early on and are used extensively.  Today's pgBackRest
does that at a file level, but I would very much like to get to a block
level shortly after we finish rewriting it into C and porting it to
Windows (and probably the other platforms PG runs on today), which isn't
very far off now.  I'd like to make sure that whatever core ends up with
as an incremental backup solution also matches very closely what we do
with pgBackRest too, but everything that's been discussed here seems
pretty reasonable when it comes to the bits around how the blocks are
detected and the files get stitched back together, so I don't expect
there to be too much of an issue there.

What I'm afraid will be lackluster is adding block-level incremental
backup support to pg_basebackup without any support for managing
backups or anything else.  I'm also concerned that it's going to mean
that people who want to use incremental backup with pg_basebackup are
going to have to write a lot of their own management code (probably in
shell scripts and such...) around that and if they get anything wrong
there then people are going to end up with bad backups that they can't
restore from, or they'll have corrupted clusters if they do manage to
get them restored.

It'd also be nice to have as much exposed through the common library as
possible when it comes to, well, everything being discussed, so that the
external tools could leverage that code and avoid having to write their
own.  This would probably apply more to the WAL-scanning discussion, but
figured I'd mention it here too.

If the protocol was implemented in a way that we could leverage it from
external tools in a parallel fashion then I'd be more excited about the
overall body of work, although, thinking about it a bit more, I have to
admit that I'm not sure that pgBackRest would end up using it in any
case, no matter how it's implemented, since it wouldn't support
compression or encryption, both of which we support doing in-stream
before the data leaves the server, though the external tools which don't
support those options likely would find the parallel option more
appealing.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: block-level incremental backup

2019-04-22 Thread Konstantin Knizhnik





On 22.04.2019 2:02, Robert Haas wrote:

On Sat, Apr 20, 2019 at 4:32 PM Stephen Frost  wrote:

Having been around for a while working on backup-related things, if I
was to implement the protocol for pg_basebackup today, I'd definitely
implement "give me a list" and "give me this file" rather than the
tar-based approach, because I've learned that people want to be
able to do parallel backups and that's a decent way to do that.  I
wouldn't set out and implement something new that's there's just no hope
of making parallel.  Maybe the first write of pg_basebackup would still
be simple and serial since it's certainly more work to make a frontend
tool like that work in parallel, but at least the protocol would be
ready to support a parallel option being added alter without being
rewritten.

And that's really what I was trying to get at here- if we've got the
choice now to decide what this is going to look like from a protocol
level, it'd be great if we could make it able to support being used in a
parallel fashion, even if pg_basebackup is still single-threaded.

I think we're getting closer to a meeting of the minds here, but I
don't think it's intrinsically necessary to rewrite the whole method
of operation of pg_basebackup to implement incremental backup in a
sensible way.  One could instead just do a straightforward extension
to the existing BASE_BACKUP command to enable incremental backup.
Then, to enable parallel full backup and all sorts of out-of-core
hacking, one could expand the command language to allow tools to
access individual steps: START_BACKUP, SEND_FILE_LIST,
SEND_FILE_CONTENTS, STOP_BACKUP, or whatever.  The second thing makes
for an appealing project, but I do not think there is a technical
reason why it has to be done first.  Or for that matter why it has to
be done second.  As I keep saying, incremental backup and full backup
are separate projects and I believe it's completely reasonable for
whoever is doing the work to decide on the order in which they would
like to do the work.

Having said that, I'm curious what people other than Stephen (and
other pgbackrest hackers) think about the relative value of parallel
backup vs. incremental backup.  Stephen appears quite convinced that
parallel backup is full of win and incremental backup is a bit of a
yawn by comparison, and while I certainly would not want to discount
the value of his experience in this area, it sometimes happens on this
mailing list that [ drum roll please ] not everybody agrees about
everything.  So, what do other people think?



Based on the experience of pg_probackup users I can say that  there is 
no 100% winer and depending on use case either

parallel either incremental backups are preferable.
- If size of database is not so larger and intensity of updates is high 
enough, then parallel backup within one data center is definitely more 
efficient solution.
- If size of database is very large and data is rarely updated or 
database is mostly append-only, then incremental backup is preferable.
- Some customers need to collect at central server backups of databases 
installed at many nodes with slow and unreliable connection (assume DBMS 
installed at locomotives). Definitely parallelism can not help here, 
unlike support of incremental backup.
- Parallel backup more aggressively consumes resources of the system, 
interfering with normal work of application. So performing parallel 
backup may cause significant degradation of application speed.


pg_probackup supports both features: parallel and incremental backups 
and it is up to user how to use it in more efficient way for particular 
configuration.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: block-level incremental backup

2019-04-21 Thread Robert Haas

On Sat, Apr 20, 2019 at 4:32 PM Stephen Frost  wrote:
> Having been around for a while working on backup-related things, if I
> was to implement the protocol for pg_basebackup today, I'd definitely
> implement "give me a list" and "give me this file" rather than the
> tar-based approach, because I've learned that people want to be
> able to do parallel backups and that's a decent way to do that.  I
> wouldn't set out and implement something new that's there's just no hope
> of making parallel.  Maybe the first write of pg_basebackup would still
> be simple and serial since it's certainly more work to make a frontend
> tool like that work in parallel, but at least the protocol would be
> ready to support a parallel option being added alter without being
> rewritten.
>
> And that's really what I was trying to get at here- if we've got the
> choice now to decide what this is going to look like from a protocol
> level, it'd be great if we could make it able to support being used in a
> parallel fashion, even if pg_basebackup is still single-threaded.

I think we're getting closer to a meeting of the minds here, but I
don't think it's intrinsically necessary to rewrite the whole method
of operation of pg_basebackup to implement incremental backup in a
sensible way.  One could instead just do a straightforward extension
to the existing BASE_BACKUP command to enable incremental backup.
Then, to enable parallel full backup and all sorts of out-of-core
hacking, one could expand the command language to allow tools to
access individual steps: START_BACKUP, SEND_FILE_LIST,
SEND_FILE_CONTENTS, STOP_BACKUP, or whatever.  The second thing makes
for an appealing project, but I do not think there is a technical
reason why it has to be done first.  Or for that matter why it has to
be done second.  As I keep saying, incremental backup and full backup
are separate projects and I believe it's completely reasonable for
whoever is doing the work to decide on the order in which they would
like to do the work.

Having said that, I'm curious what people other than Stephen (and
other pgbackrest hackers) think about the relative value of parallel
backup vs. incremental backup.  Stephen appears quite convinced that
parallel backup is full of win and incremental backup is a bit of a
yawn by comparison, and while I certainly would not want to discount
the value of his experience in this area, it sometimes happens on this
mailing list that [ drum roll please ] not everybody agrees about
everything.  So, what do other people think?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

2019-04-21 Thread Andrey Borodin

> 21 апр. 2019 г., в 1:13, Robert Haas  написал(а):
> 
> On Sat, Apr 20, 2019 at 12:44 PM Andrey Borodin  wrote:
>> Incremental backup of 1Tb DB made with distance of few minutes (small change 
>> set) is few Gbs. All of this size is made of FSM (no LSN) and VM (hard to 
>> use LSN).
>> Sure, this overhead size is fine if we make daily backup. But at some 
>> frequency of backups it will be too much.
> 
> It seems like if the backups are only a few minutes apart, PITR might
> be a better choice than super-frequent incremental backups.  What do
> you think about that?
PITR is painfully slow on heavily loaded clusters. I observed restorations when 
5 seconds of WAL were restored in 4 seconds. Backup was only few hours past 
primary node, but could catch up only at night.
And during this process only one of 56 cpu cores was used. And SSD RAID 
throughput was not 100% utilized.

Block level delta backups can be restored very efficiently: if we restore from 
newest to past steps, we write no more than cluster size at last backup.

>> I think that problem of incrementing FSM and VM is too distant now.
>> But if I had to implement it right now I'd choose following way: do not 
>> backup FSM and VM, recreate it during restore. Looks like it is possible, 
>> but too much AM-specific.
> 
> Interesting idea - that's worth some more thought.

Core routines to recreate VM and FSM would be cool :) But this need to be done 
without extra IO, not an easy trick.

>> Here's 53 mentions of "parallel backup". I want to note that there may be 
>> parallel read from disk and parallel network transmission. Things between 
>> these two are neglectable and can be single-threaded. From my POV, it's not 
>> about threads, it's about saturated IO controllers.
>> Also I think parallel restore matters more than parallel backup. Backups 
>> themself can be slow, on many clusters we even throttle disk IO. But users 
>> may want parallel backup to catch-up standby.
> 
> I'm not sure I entirely understand your point here -- are you saying
> that parallel backup is important, or that it's not important, or
> something in between?  Do you think it's more or less important than
> incremental backup?
I think that there is no such thing as parallel backup. Backup creation is 
composite process of many subprocesses.

In my experience, parallel network transmission is cool and very important, it 
makes upload 3 times faster. But my experience is limited to cloud storages. 
Would this hold if storage backend is local FS? I have no idea.
Parallel reading from disk has the same effect. Compression and encryption can 
be single threaded, I think it will not be bottleneck (unless one uses lzma's 
neighborhood on Pareto frontier).

For me, I think the most important thing is incremental backups (with parallel 
steps merge) and then parallel backup.
But there is huge fraction of users, who can benefit from parallel backup and 
do not need incremental backup at all.

Best regards, Andrey Borodin.

1 2 >

1 - 100 of 152 matches

Mail list logo