Re: [HACKERS] Implementing incremental backup

2013-06-22 Thread Cédric Villemain
Le samedi 22 juin 2013 01:09:20, Jehan-Guillaume (ioguix) de Rorthais a écrit 
:
 On 20/06/2013 03:25, Tatsuo Ishii wrote:
  On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii is...@postgresql.org 
wrote:
  On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net 
wrote:
  * Claudio Freire (klaussfre...@gmail.com) wrote:
 [...]
 
  The only bottleneck here, is WAL archiving. This assumes you can
  afford WAL archiving at least to a local filesystem, and that the WAL
  compressor is able to cope with WAL bandwidth. But I have no reason to
  think you'd be able to cope with dirty-map updates anyway if you were
  saturating the WAL compressor, as the compressor is more efficient on
  amortized cost per transaction than the dirty-map approach.
  
  Thank you for detailed explanation. I will think more about this.
 
 Just for the record, I was mulling over this idea since a bunch of
 month. I even talked about that with Dimitri Fontaine some weeks ago
 with some beers :)
 
 My idea came from a customer during a training explaining me the
 difference between differential and incremental backup in Oracle.
 
 My approach would have been to create a standalone tool (say
 pg_walaggregate) which takes a bunch of WAL from archives and merge them
 in a single big file, keeping only the very last version of each page
 after aggregating all their changes. The resulting file, aggregating all
 the changes from given WAL files would be the differential backup.
 
 A differential backup resulting from a bunch of WAL between W1 and Wn
 would help to recover much faster to the time of Wn than replaying all
 the WALs between W1 and Wn and saves a lot of space.
 
 I was hoping to find some time to dig around this idea, but as the
 subject rose here, then here are my 2¢!

something like that maybe :
./pg_xlogdump -b \
../data/pg_xlog/00010001 \   
../data/pg_xlog/00010005| \
grep 'backup bkp' | awk '{print ($5,$9)}'

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Implementing incremental backup

2013-06-22 Thread Andres Freund
On 2013-06-22 15:58:35 +0200, Cédric Villemain wrote:
  A differential backup resulting from a bunch of WAL between W1 and Wn
  would help to recover much faster to the time of Wn than replaying all
  the WALs between W1 and Wn and saves a lot of space.
  
  I was hoping to find some time to dig around this idea, but as the
  subject rose here, then here are my 2¢!
 
 something like that maybe :
 ./pg_xlogdump -b \
 ../data/pg_xlog/00010001 \   
 ../data/pg_xlog/00010005| \
 grep 'backup bkp' | awk '{print ($5,$9)}'

Note that it's a bit more complex than that for a number of reasons:
* we don't log full page images for e.g. new heap pages, we just set the
  XLOG_HEAP_INIT_PAGE flag on the record
* there also are XLOG_FPI records
* How do you get a base backup as the basis to apply those to? You need
  it to be recovered exactly to a certain point...

But yes, I think something can be done in the end. I think Heikki's
pg_rewind already has quite a bit of the required logic.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-21 Thread Jehan-Guillaume (ioguix) de Rorthais
On 20/06/2013 03:25, Tatsuo Ishii wrote:
 On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii is...@postgresql.org wrote:
 On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
[...]

 The only bottleneck here, is WAL archiving. This assumes you can
 afford WAL archiving at least to a local filesystem, and that the WAL
 compressor is able to cope with WAL bandwidth. But I have no reason to
 think you'd be able to cope with dirty-map updates anyway if you were
 saturating the WAL compressor, as the compressor is more efficient on
 amortized cost per transaction than the dirty-map approach.
 
 Thank you for detailed explanation. I will think more about this.

Just for the record, I was mulling over this idea since a bunch of
month. I even talked about that with Dimitri Fontaine some weeks ago
with some beers :)

My idea came from a customer during a training explaining me the
difference between differential and incremental backup in Oracle.

My approach would have been to create a standalone tool (say
pg_walaggregate) which takes a bunch of WAL from archives and merge them
in a single big file, keeping only the very last version of each page
after aggregating all their changes. The resulting file, aggregating all
the changes from given WAL files would be the differential backup.

A differential backup resulting from a bunch of WAL between W1 and Wn
would help to recover much faster to the time of Wn than replaying all
the WALs between W1 and Wn and saves a lot of space.

I was hoping to find some time to dig around this idea, but as the
subject rose here, then here are my 2¢!

Cheers,
-- 
Jehan-Guillaume (ioguix) de Rorthais


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-20 Thread Magnus Hagander
On Thu, Jun 20, 2013 at 12:18 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Claudio Freire escribió:
 On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote:
  * Claudio Freire (klaussfre...@gmail.com) wrote:
  I don't see how this is better than snapshotting at the filesystem
  level. I have no experience with TB scale databases (I've been limited
  to only hundreds of GB), but from my limited mid-size db experience,
  filesystem snapshotting is pretty much the same thing you propose
  there (xfs_freeze), and it works pretty well. There's even automated
  tools to do that, like bacula, and they can handle incremental
  snapshots.
 
  Large databases tend to have multiple filesystems and getting a single,
  consistent, snapshot across all of them while under load is..
  'challenging'.  It's fine if you use pg_start/stop_backup() and you're
  saving the XLOGs off, but if you can't do that..

 Good point there.

 I still don't like the idea of having to mark each modified page. The
 WAL compressor idea sounds a lot more workable. As in scalable.

 There was a project that removed useless WAL records from the stream,
 to make it smaller and useful for long-term archiving.  It only removed
 FPIs as far as I recall.  It's dead now, and didn't compile on recent
 (9.1?) Postgres because of changes in the WAL structs, IIRC.

 This doesn't help if you have a large lot of UPDATEs that touch the same
 set of rows over and over, though.  Tatsuo-san's proposal would allow
 this use-case to work nicely because you only keep one copy of such
 data, not one for each modification.

 If you have the two technologies, you could teach them to work in
 conjunction: you set up WAL replication, and tell the WAL compressor to
 prune updates for high-update tables (avoid useless traffic), then use
 incremental backup to back these up.  This seems like it would have a
 lot of moving parts and be rather bug-prone, though.

Just as a datapoint, I think this is basically what at least some
other database engine (sqlserver) calls incremental vs
differential backup.

Differential backup keep tracks of which blocks have changed (by one
way or another - maybe as simple as the LSN, but it doesn't matter
how, really) and backs up just those blocks (diffed back to the base
backup).

Incremental does the transaction log, which is basically what we do
with log archiving except it's not done in realtime - it's all saved
on the master until the backup command runs.

Of course, it's quite a been a few years since I set up one of those
in anger, so disclaimer for that info being out of date :)

Didn't pg_rman try to do something based on the page LSN to achieve
something similar to this?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
Hi,

I'm thinking of implementing an incremental backup tool for
PostgreSQL. The use case for the tool would be taking a backup of huge
database. For that size of database, pg_dump is too slow, even WAL
archive is too slow/ineffective as well. However even in a TB
database, sometimes actual modified blocks are not that big, may be
even several GB. So if we can backup those modified blocks only,
that would be an effective incremental backup method.

For now, my idea is pretty vague.

- Record info about modified blocks. We don't need to remember the
  whole history of a block if the block was modified multiple times.
  We just remember that the block was modified since the last
  incremental backup was taken.

- The info could be obtained by trapping calls to mdwrite() etc. We need
  to be careful to avoid such blocks used in xlogs and temporary
  tables to not waste resource.

- If many blocks were modified in a file, we may be able to condense
  the info as the whole file was modified to reduce the amount of
  info.

- How to take a consistent incremental backup is an issue. I can't
  think of a clean way other than locking whole cluster, which is
  obviously unacceptable. Maybe we should give up hot backup?

Comments, thoughts are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Stephen Frost
Tatsuo,

* Tatsuo Ishii (is...@postgresql.org) wrote:
 I'm thinking of implementing an incremental backup tool for
 PostgreSQL. The use case for the tool would be taking a backup of huge
 database. For that size of database, pg_dump is too slow, even WAL
 archive is too slow/ineffective as well. However even in a TB
 database, sometimes actual modified blocks are not that big, may be
 even several GB. So if we can backup those modified blocks only,
 that would be an effective incremental backup method.

I'm trying to figure out how that's actually different from WAL..?  It
sounds like you'd get what you're suggesting with simply increasing the
checkpoint timeout until the WAL stream is something which you can keep
up with.  Of course, the downside there is that you'd have to replay
more WAL when recovering.

What about a tool which receives WALs but then compresses them across
a longer period of time than the normal checkpointing by simply keeping
in memory the current set of blocks modified and applying each WAL
record against that block in memory as it reads the WAL?  It would then
purge that block out using a full-page WAL write at some pre-defined
point, perhaps at the end of the overall backup?

Consider this: connect the WAL-compressor to a PG backend, issue a
'start backup', which the WAL-compressor detects and then starts keeping
track of every block changed in memory, applying the WAL stream of full
page and non-full-page changes to the in memory set, until the 'stop
backup' is called, at which point the WAL-compressor simply dumps all
the records as full page writes into this new WAL stream.

Or perhaps some combination of an 'always running' WAL compressor which
simply reduces the overall size of the WAL stream with coordination
around full backups.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Ants Aasma
On Wed, Jun 19, 2013 at 1:13 PM, Tatsuo Ishii is...@postgresql.org wrote:
 I'm thinking of implementing an incremental backup tool for
 PostgreSQL. The use case for the tool would be taking a backup of huge
 database. For that size of database, pg_dump is too slow, even WAL
 archive is too slow/ineffective as well. However even in a TB
 database, sometimes actual modified blocks are not that big, may be
 even several GB. So if we can backup those modified blocks only,
 that would be an effective incremental backup method.

PostgreSQL definitely needs better tools to cope with TB scale
databases. Especially when the ideas that get rid of anti-wraparound
vacuums materialize and make huge databases more practical.

 For now, my idea is pretty vague.

 - Record info about modified blocks. We don't need to remember the
   whole history of a block if the block was modified multiple times.
   We just remember that the block was modified since the last
   incremental backup was taken.

 - The info could be obtained by trapping calls to mdwrite() etc. We need
   to be careful to avoid such blocks used in xlogs and temporary
   tables to not waste resource.

Unless I'm missing something, the information about modified blocks
can also be obtained by reading WAL, not requiring any modifications
to core.

 - If many blocks were modified in a file, we may be able to condense
   the info as the whole file was modified to reduce the amount of
   info.

You could keep a list of block ranges modified and when the list gets
too large, merge ranges that are close together.

 - How to take a consistent incremental backup is an issue. I can't
   think of a clean way other than locking whole cluster, which is
   obviously unacceptable. Maybe we should give up hot backup?

I don't see why regular pg_start_backup(), copy out modified blocks,
pg_stop_backup(), copy WAL needed to recover approach wouldn't work
here.

A good feature of the tool would be to apply the incremental backup to
the previous backup while copying out old blocks so you could have the
latest full backup available and incremental changes to rewind it to
the previous version.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Claudio Freire
On Wed, Jun 19, 2013 at 7:13 AM, Tatsuo Ishii is...@postgresql.org wrote:

 For now, my idea is pretty vague.

 - Record info about modified blocks. We don't need to remember the
   whole history of a block if the block was modified multiple times.
   We just remember that the block was modified since the last
   incremental backup was taken.

 - The info could be obtained by trapping calls to mdwrite() etc. We need
   to be careful to avoid such blocks used in xlogs and temporary
   tables to not waste resource.

 - If many blocks were modified in a file, we may be able to condense
   the info as the whole file was modified to reduce the amount of
   info.

 - How to take a consistent incremental backup is an issue. I can't
   think of a clean way other than locking whole cluster, which is
   obviously unacceptable. Maybe we should give up hot backup?


I don't see how this is better than snapshotting at the filesystem
level. I have no experience with TB scale databases (I've been limited
to only hundreds of GB), but from my limited mid-size db experience,
filesystem snapshotting is pretty much the same thing you propose
there (xfs_freeze), and it works pretty well. There's even automated
tools to do that, like bacula, and they can handle incremental
snapshots.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Jim Nasby

On 6/19/13 11:02 AM, Claudio Freire wrote:

On Wed, Jun 19, 2013 at 7:13 AM, Tatsuo Ishii is...@postgresql.org wrote:


For now, my idea is pretty vague.

- Record info about modified blocks. We don't need to remember the
   whole history of a block if the block was modified multiple times.
   We just remember that the block was modified since the last
   incremental backup was taken.

- The info could be obtained by trapping calls to mdwrite() etc. We need
   to be careful to avoid such blocks used in xlogs and temporary
   tables to not waste resource.

- If many blocks were modified in a file, we may be able to condense
   the info as the whole file was modified to reduce the amount of
   info.

- How to take a consistent incremental backup is an issue. I can't
   think of a clean way other than locking whole cluster, which is
   obviously unacceptable. Maybe we should give up hot backup?



I don't see how this is better than snapshotting at the filesystem
level. I have no experience with TB scale databases (I've been limited
to only hundreds of GB), but from my limited mid-size db experience,
filesystem snapshotting is pretty much the same thing you propose
there (xfs_freeze), and it works pretty well. There's even automated
tools to do that, like bacula, and they can handle incremental
snapshots.


A snapshot is not the same as an incremental backup; it presents itself as a 
full copy of the filesystem. Actually, since it's on the same underlying 
storage a snapshot isn't really a good backup at all.

The proposal (at least as I read it) is to provide a means to easily deal with 
*only* the data that has actually *changed* since the last backup was taken.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Claudio Freire
On Wed, Jun 19, 2013 at 3:54 PM, Jim Nasby j...@nasby.net wrote:
 On 6/19/13 11:02 AM, Claudio Freire wrote:

 On Wed, Jun 19, 2013 at 7:13 AM, Tatsuo Ishii is...@postgresql.org
 wrote:


 For now, my idea is pretty vague.

 - Record info about modified blocks. We don't need to remember the
whole history of a block if the block was modified multiple times.
We just remember that the block was modified since the last
incremental backup was taken.

 - The info could be obtained by trapping calls to mdwrite() etc. We need
to be careful to avoid such blocks used in xlogs and temporary
tables to not waste resource.

 - If many blocks were modified in a file, we may be able to condense
the info as the whole file was modified to reduce the amount of
info.

 - How to take a consistent incremental backup is an issue. I can't
think of a clean way other than locking whole cluster, which is
obviously unacceptable. Maybe we should give up hot backup?



 I don't see how this is better than snapshotting at the filesystem
 level. I have no experience with TB scale databases (I've been limited
 to only hundreds of GB), but from my limited mid-size db experience,
 filesystem snapshotting is pretty much the same thing you propose
 there (xfs_freeze), and it works pretty well. There's even automated
 tools to do that, like bacula, and they can handle incremental
 snapshots.


 A snapshot is not the same as an incremental backup; it presents itself as a
 full copy of the filesystem. Actually, since it's on the same underlying
 storage a snapshot isn't really a good backup at all.

Read on bacula[0], which is huge and thus this info may be hard to
find, you can take that snapshot, which will be on the same filesystem
of course, and *then* back it up. So you get a consistent snapshot on
your backup, which means a correct backup, and the backup certainly
doesn't have to be on the same filesystem. It even works for ext3 if
you install the right kernel modules.

Yes, it's a snapshot of the entire filesystem. So it's not the same as
a database-only backup. But it does have a huge overlap don't you
think?

When WAL archiving can get you PITR, and bacula-like tools can get you
incremental and consistent full-FS-snapshot backups, what does the
proposed feature add? I don't think you can get PITR with the proposed
feature, as it takes a snapshot only when told to, and it can't take
multiple snapshots. The only way to get PITR AFAIK is with WAL
archiving, so whether it's viable or not for TB-sized databases is
moot, if it's the only option.

And it will add an overhead. A considerable overhead. Even if you only
have to flip a bit on some page map, it amplifies writes twofold
(unless writes can be coalesced, of which there is no guarantee).

In the end, it may be preferrable to just alter PG's behavior slightly
to make bacula, rsync or whichever tool's job easier. Like trying hard
not to write to cold segments, so entire segments can be skipped by
quick mtime checks.

[0] http://www.bacula.org/en/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Stephen Frost
* Claudio Freire (klaussfre...@gmail.com) wrote:
 I don't see how this is better than snapshotting at the filesystem
 level. I have no experience with TB scale databases (I've been limited
 to only hundreds of GB), but from my limited mid-size db experience,
 filesystem snapshotting is pretty much the same thing you propose
 there (xfs_freeze), and it works pretty well. There's even automated
 tools to do that, like bacula, and they can handle incremental
 snapshots.

Large databases tend to have multiple filesystems and getting a single,
consistent, snapshot across all of them while under load is..
'challenging'.  It's fine if you use pg_start/stop_backup() and you're
saving the XLOGs off, but if you can't do that..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Claudio Freire
On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
 I don't see how this is better than snapshotting at the filesystem
 level. I have no experience with TB scale databases (I've been limited
 to only hundreds of GB), but from my limited mid-size db experience,
 filesystem snapshotting is pretty much the same thing you propose
 there (xfs_freeze), and it works pretty well. There's even automated
 tools to do that, like bacula, and they can handle incremental
 snapshots.

 Large databases tend to have multiple filesystems and getting a single,
 consistent, snapshot across all of them while under load is..
 'challenging'.  It's fine if you use pg_start/stop_backup() and you're
 saving the XLOGs off, but if you can't do that..

Good point there.

I still don't like the idea of having to mark each modified page. The
WAL compressor idea sounds a lot more workable. As in scalable.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Alvaro Herrera
Claudio Freire escribió:
 On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote:
  * Claudio Freire (klaussfre...@gmail.com) wrote:
  I don't see how this is better than snapshotting at the filesystem
  level. I have no experience with TB scale databases (I've been limited
  to only hundreds of GB), but from my limited mid-size db experience,
  filesystem snapshotting is pretty much the same thing you propose
  there (xfs_freeze), and it works pretty well. There's even automated
  tools to do that, like bacula, and they can handle incremental
  snapshots.
 
  Large databases tend to have multiple filesystems and getting a single,
  consistent, snapshot across all of them while under load is..
  'challenging'.  It's fine if you use pg_start/stop_backup() and you're
  saving the XLOGs off, but if you can't do that..
 
 Good point there.
 
 I still don't like the idea of having to mark each modified page. The
 WAL compressor idea sounds a lot more workable. As in scalable.

There was a project that removed useless WAL records from the stream,
to make it smaller and useful for long-term archiving.  It only removed
FPIs as far as I recall.  It's dead now, and didn't compile on recent
(9.1?) Postgres because of changes in the WAL structs, IIRC.

This doesn't help if you have a large lot of UPDATEs that touch the same
set of rows over and over, though.  Tatsuo-san's proposal would allow
this use-case to work nicely because you only keep one copy of such
data, not one for each modification.

If you have the two technologies, you could teach them to work in
conjunction: you set up WAL replication, and tell the WAL compressor to
prune updates for high-update tables (avoid useless traffic), then use
incremental backup to back these up.  This seems like it would have a
lot of moving parts and be rather bug-prone, though.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Claudio Freire
On Wed, Jun 19, 2013 at 7:18 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 If you have the two technologies, you could teach them to work in
 conjunction: you set up WAL replication, and tell the WAL compressor to
 prune updates for high-update tables (avoid useless traffic), then use
 incremental backup to back these up.  This seems like it would have a
 lot of moving parts and be rather bug-prone, though.

I don't think it would be worse than storage-manager-level stuff. And
though more complex, don't underestimate the pros: lower footprint,
better scalability, and you get consistent online backups.

That mechanism can also be used to distill a list of modified pages,
mind you, instead of hooking into storage-manager stuff. The pro
there, is that it wouldn't amplify writes. The con there is that you
don't get consistent online backups.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
 I'm thinking of implementing an incremental backup tool for
 PostgreSQL. The use case for the tool would be taking a backup of huge
 database. For that size of database, pg_dump is too slow, even WAL
 archive is too slow/ineffective as well. However even in a TB
 database, sometimes actual modified blocks are not that big, may be
 even several GB. So if we can backup those modified blocks only,
 that would be an effective incremental backup method.
 
 I'm trying to figure out how that's actually different from WAL..?  It
 sounds like you'd get what you're suggesting with simply increasing the
 checkpoint timeout until the WAL stream is something which you can keep
 up with.  Of course, the downside there is that you'd have to replay
 more WAL when recovering.

Yeah, at first I thought using WAL was a good idea.  However I realized
that the problem using WAL is we cannot backup unlogged tables because
they are not written to WAL.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Claudio Freire
On Wed, Jun 19, 2013 at 7:39 PM, Tatsuo Ishii is...@postgresql.org wrote:
 I'm thinking of implementing an incremental backup tool for
 PostgreSQL. The use case for the tool would be taking a backup of huge
 database. For that size of database, pg_dump is too slow, even WAL
 archive is too slow/ineffective as well. However even in a TB
 database, sometimes actual modified blocks are not that big, may be
 even several GB. So if we can backup those modified blocks only,
 that would be an effective incremental backup method.

 I'm trying to figure out how that's actually different from WAL..?  It
 sounds like you'd get what you're suggesting with simply increasing the
 checkpoint timeout until the WAL stream is something which you can keep
 up with.  Of course, the downside there is that you'd have to replay
 more WAL when recovering.

 Yeah, at first I thought using WAL was a good idea.  However I realized
 that the problem using WAL is we cannot backup unlogged tables because
 they are not written to WAL.

How does replication handle that?

Because I doubt that's an issue only with backups.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
 I'm trying to figure out how that's actually different from WAL..?  It
 sounds like you'd get what you're suggesting with simply increasing the
 checkpoint timeout until the WAL stream is something which you can keep
 up with.  Of course, the downside there is that you'd have to replay
 more WAL when recovering.

 Yeah, at first I thought using WAL was a good idea.  However I realized
 that the problem using WAL is we cannot backup unlogged tables because
 they are not written to WAL.
 
 How does replication handle that?
 
 Because I doubt that's an issue only with backups.

Unlogged tables are not replicated to streaming replication
standbys. It is clearly stated in the doc.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Stephen Frost
* Tatsuo Ishii (is...@postgresql.org) wrote:
 Yeah, at first I thought using WAL was a good idea.  However I realized
 that the problem using WAL is we cannot backup unlogged tables because
 they are not written to WAL.

Unlogged tables are also nuked on recovery, so I'm not sure why you
think an incremental backup would help..  If you're recovering (even
from a simple crash), unlogged tables are going to go away.

Put simply, unlogged tables should not be used for any data you care
about, period.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
 * Tatsuo Ishii (is...@postgresql.org) wrote:
 Yeah, at first I thought using WAL was a good idea.  However I realized
 that the problem using WAL is we cannot backup unlogged tables because
 they are not written to WAL.
 
 Unlogged tables are also nuked on recovery, so I'm not sure why you
 think an incremental backup would help..  If you're recovering (even
 from a simple crash), unlogged tables are going to go away.

If my memory serves, unlogged tables are not nuked when PostgeSQL is
stopped by a planned shutdown (not by crash or by pg_ctl -m i
stop). If PostgreSQL works so, incremental backup should be able to
recover unlogged tables as well, at least people would expect so IMO.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Stephen Frost
* Tatsuo Ishii (is...@postgresql.org) wrote:
  * Tatsuo Ishii (is...@postgresql.org) wrote:
  Yeah, at first I thought using WAL was a good idea.  However I realized
  that the problem using WAL is we cannot backup unlogged tables because
  they are not written to WAL.
  
  Unlogged tables are also nuked on recovery, so I'm not sure why you
  think an incremental backup would help..  If you're recovering (even
  from a simple crash), unlogged tables are going to go away.
 
 If my memory serves, unlogged tables are not nuked when PostgeSQL is
 stopped by a planned shutdown (not by crash or by pg_ctl -m i
 stop). If PostgreSQL works so, incremental backup should be able to
 recover unlogged tables as well, at least people would expect so IMO.

Sure, if you shut down PG, rsync the entire thing and then bring it back
up then unlogged tables should work when backed up.

They're not WAL'd, so expecting them to work when restoring a backup of
a PG that had been running at the time of the backup is folly.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
 * Tatsuo Ishii (is...@postgresql.org) wrote:
  * Tatsuo Ishii (is...@postgresql.org) wrote:
  Yeah, at first I thought using WAL was a good idea.  However I realized
  that the problem using WAL is we cannot backup unlogged tables because
  they are not written to WAL.
  
  Unlogged tables are also nuked on recovery, so I'm not sure why you
  think an incremental backup would help..  If you're recovering (even
  from a simple crash), unlogged tables are going to go away.
 
 If my memory serves, unlogged tables are not nuked when PostgeSQL is
 stopped by a planned shutdown (not by crash or by pg_ctl -m i
 stop). If PostgreSQL works so, incremental backup should be able to
 recover unlogged tables as well, at least people would expect so IMO.
 
 Sure, if you shut down PG, rsync the entire thing and then bring it back
 up then unlogged tables should work when backed up.

I don't think using rsync (or tar or whatever general file utils)
against TB database for incremental backup is practical. If it's
practical, I would never propose my idea.

 They're not WAL'd, so expecting them to work when restoring a backup of
 a PG that had been running at the time of the backup is folly.

Probably you forget about our nice pg_dump tool:-)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
 On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
 I don't see how this is better than snapshotting at the filesystem
 level. I have no experience with TB scale databases (I've been limited
 to only hundreds of GB), but from my limited mid-size db experience,
 filesystem snapshotting is pretty much the same thing you propose
 there (xfs_freeze), and it works pretty well. There's even automated
 tools to do that, like bacula, and they can handle incremental
 snapshots.

 Large databases tend to have multiple filesystems and getting a single,
 consistent, snapshot across all of them while under load is..
 'challenging'.  It's fine if you use pg_start/stop_backup() and you're
 saving the XLOGs off, but if you can't do that..
 
 Good point there.
 
 I still don't like the idea of having to mark each modified page. The
 WAL compressor idea sounds a lot more workable. As in scalable.

Why do you think WAL compressor idea is more scalable? I really want
to know why. Besides the unlogged tables issue, I can accept the idea
if WAL based solution is much more efficient. If there's no perfect,
ideal solution, we need to prioritize things. My #1 priority is
allowing to create incremental backup against TB database, and the
backup file should be small enough and the time to create it is
acceptable. I just don't know why scanning WAL stream is much cheaper
than recording modified page information.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Stephen Frost
* Tatsuo Ishii (is...@postgresql.org) wrote:
 I don't think using rsync (or tar or whatever general file utils)
 against TB database for incremental backup is practical. If it's
 practical, I would never propose my idea.

You could use rsync for incremental updates if you wanted, it'd
certainly be faster in some cases and it's entirely possible to use such
against TB databases in some cases.

  They're not WAL'd, so expecting them to work when restoring a backup of
  a PG that had been running at the time of the backup is folly.
 
 Probably you forget about our nice pg_dump tool:-)

I don't consider pg_dump a mechanism for backing up TB databases.
You're certainly welcome to use it for dumping unlogged tables, but I
can't support the notion that unlogged tables should be supported
through WAL-supported file-based backups.  If we're going down this
road, I'd much rather see support for exporting whole files from and
importing them back into PG in some way which completely avoids the need
to re-parse or re-validate data and supports pulling in indexes as part
of the import.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Stephen Frost
* Tatsuo Ishii (is...@postgresql.org) wrote:
 Why do you think WAL compressor idea is more scalable? I really want
 to know why. Besides the unlogged tables issue, I can accept the idea
 if WAL based solution is much more efficient. If there's no perfect,
 ideal solution, we need to prioritize things. My #1 priority is
 allowing to create incremental backup against TB database, and the
 backup file should be small enough and the time to create it is
 acceptable. I just don't know why scanning WAL stream is much cheaper
 than recording modified page information.

Because that's what the WAL *is*..?

Why would you track what's changed twice?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Claudio Freire
On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii is...@postgresql.org wrote:
 On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
 I don't see how this is better than snapshotting at the filesystem
 level. I have no experience with TB scale databases (I've been limited
 to only hundreds of GB), but from my limited mid-size db experience,
 filesystem snapshotting is pretty much the same thing you propose
 there (xfs_freeze), and it works pretty well. There's even automated
 tools to do that, like bacula, and they can handle incremental
 snapshots.

 Large databases tend to have multiple filesystems and getting a single,
 consistent, snapshot across all of them while under load is..
 'challenging'.  It's fine if you use pg_start/stop_backup() and you're
 saving the XLOGs off, but if you can't do that..

 Good point there.

 I still don't like the idea of having to mark each modified page. The
 WAL compressor idea sounds a lot more workable. As in scalable.

 Why do you think WAL compressor idea is more scalable? I really want
 to know why. Besides the unlogged tables issue, I can accept the idea
 if WAL based solution is much more efficient. If there's no perfect,
 ideal solution, we need to prioritize things. My #1 priority is
 allowing to create incremental backup against TB database, and the
 backup file should be small enough and the time to create it is
 acceptable. I just don't know why scanning WAL stream is much cheaper
 than recording modified page information.

Because it aggregates updates.

When you work at the storage manager level, you only see block-sized
operations. This results in the need to WAL-log bit-sized updates on
some hypothetical dirty-map index. Even when done 100% efficiently,
this implies at least one write per dirtied block, which could as much
as double write I/O in the worse (and totally expectable) case.

When you do it at WAL segment recycle time, or better yet during
checkpoints, you deal with checkpoint-scale operations. You can
aggregate dirty-map updates, if you keep a dirty-map, which could not
only reduce I/O considerably (by a much increased likelihood of write
coalescence), but also let you schedule it better (toss it in the
background, with checkpoints). This is for gathering dirty-map
updates, which still leaves you with the complex problem of then
actually snapshotting those pages consistently without interfering
with ongoing transactions.

If you do a WAL compressor, WAL entries are write-once, so you'll have
no trouble snapshotting those pages. You have the checkpoint's initial
full page write, so you don't even have to read the page, and you can
accumulate all further partial writes into one full page write, and
dump that on an incremental archive. So, you get all the I/O
aggregation from above, which reduces I/O to the point where it only
doubles WAL I/O. It's bound by a constant, and in contrast to
dirty-map updates, it's sequential I/O so it's a lot faster. It's thus
perfectly scalable.

Not only that, but you're also amortizing incremental backup costs
over time, as you're making them constantly as opposed to regular
intervals. You'll have one incremental backup per checkpoint. If you
want to coalesce backups, you launch another compressor to merge the
last incremental checkpoint with the new one. And, now this is the
cherry on top, you only have to do this on the archived WALs, which
means you could very well do it on another system, freeing your main
cluster from all this I/O. It's thus perfectly scalable.

The only bottleneck here, is WAL archiving. This assumes you can
afford WAL archiving at least to a local filesystem, and that the WAL
compressor is able to cope with WAL bandwidth. But I have no reason to
think you'd be able to cope with dirty-map updates anyway if you were
saturating the WAL compressor, as the compressor is more efficient on
amortized cost per transaction than the dirty-map approach.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
 On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii is...@postgresql.org wrote:
 On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
 I don't see how this is better than snapshotting at the filesystem
 level. I have no experience with TB scale databases (I've been limited
 to only hundreds of GB), but from my limited mid-size db experience,
 filesystem snapshotting is pretty much the same thing you propose
 there (xfs_freeze), and it works pretty well. There's even automated
 tools to do that, like bacula, and they can handle incremental
 snapshots.

 Large databases tend to have multiple filesystems and getting a single,
 consistent, snapshot across all of them while under load is..
 'challenging'.  It's fine if you use pg_start/stop_backup() and you're
 saving the XLOGs off, but if you can't do that..

 Good point there.

 I still don't like the idea of having to mark each modified page. The
 WAL compressor idea sounds a lot more workable. As in scalable.

 Why do you think WAL compressor idea is more scalable? I really want
 to know why. Besides the unlogged tables issue, I can accept the idea
 if WAL based solution is much more efficient. If there's no perfect,
 ideal solution, we need to prioritize things. My #1 priority is
 allowing to create incremental backup against TB database, and the
 backup file should be small enough and the time to create it is
 acceptable. I just don't know why scanning WAL stream is much cheaper
 than recording modified page information.
 
 Because it aggregates updates.
 
 When you work at the storage manager level, you only see block-sized
 operations. This results in the need to WAL-log bit-sized updates on
 some hypothetical dirty-map index. Even when done 100% efficiently,
 this implies at least one write per dirtied block, which could as much
 as double write I/O in the worse (and totally expectable) case.
 
 When you do it at WAL segment recycle time, or better yet during
 checkpoints, you deal with checkpoint-scale operations. You can
 aggregate dirty-map updates, if you keep a dirty-map, which could not
 only reduce I/O considerably (by a much increased likelihood of write
 coalescence), but also let you schedule it better (toss it in the
 background, with checkpoints). This is for gathering dirty-map
 updates, which still leaves you with the complex problem of then
 actually snapshotting those pages consistently without interfering
 with ongoing transactions.
 
 If you do a WAL compressor, WAL entries are write-once, so you'll have
 no trouble snapshotting those pages. You have the checkpoint's initial
 full page write, so you don't even have to read the page, and you can
 accumulate all further partial writes into one full page write, and
 dump that on an incremental archive. So, you get all the I/O
 aggregation from above, which reduces I/O to the point where it only
 doubles WAL I/O. It's bound by a constant, and in contrast to
 dirty-map updates, it's sequential I/O so it's a lot faster. It's thus
 perfectly scalable.
 
 Not only that, but you're also amortizing incremental backup costs
 over time, as you're making them constantly as opposed to regular
 intervals. You'll have one incremental backup per checkpoint. If you
 want to coalesce backups, you launch another compressor to merge the
 last incremental checkpoint with the new one. And, now this is the
 cherry on top, you only have to do this on the archived WALs, which
 means you could very well do it on another system, freeing your main
 cluster from all this I/O. It's thus perfectly scalable.
 
 The only bottleneck here, is WAL archiving. This assumes you can
 afford WAL archiving at least to a local filesystem, and that the WAL
 compressor is able to cope with WAL bandwidth. But I have no reason to
 think you'd be able to cope with dirty-map updates anyway if you were
 saturating the WAL compressor, as the compressor is more efficient on
 amortized cost per transaction than the dirty-map approach.

Thank you for detailed explanation. I will think more about this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers